Table of Contents

General Information

A SESYNC Data Skills Workshop provides researchers from the socio-environmental synthesis community with hands-on training in open source tools for collaborative coding, data management, analysis, visualization, and dissemination. The goal of this two-day workshop is to introduce novice and intermediate scientific coders to concepts, skills and approaches for data-driven research.

The first day of the workshop (Tuesday) will utilize R and tools available through the RStudio development environment. The second day (Thursday) will introduce Python and several command line tools. The schedule below provides an overview of the specific topics we will address through a series of 8 lessons that integrate live-coding and trainee challenge exercises.

Registration is open to any faculty or research staff in the Behavioral and Social Sciences College of the University of Maryland. Participants are welcome to attend either one or both days.

Instructors:

  • Ian Carroll, Data Scientist @SESYNC
  • Mary Shelley, Associate Director of Synthesis @SESYNC

When:

The workshop will cover two full, but non-sequential days during Winter Term.

Tuesday, January 16, 2018 and Thursday, January 18, 2018

Where:

1101 Morrill Hall

Get directions with OpenStreetMap or Google Maps.

Requirements:

Participants must bring a laptop with a full keyboard and mouse/trackpad (not a tablet, iPad, etc.), and have installed a full-functioning browser (e.g. Chrome, Firefox, Safari, or Internet Explorer).

Contact:

Please email icarroll@sesync.org with any questions, or for information not covered here.

Schedule

Please note, we plan to end each day with sufficient time to answer any lengthy follow-up questions with individuals as needed.

Tuesday 9:00 Introductions & Orientation
  9:15 Basic R
  10:45 Coffee Break
  11:00 Model Building Mini-Languages
  12:15 pm Lunch Break
  1:00 Data Manipulation with “dplyr”
  2:30 Stretch Break
  2:45 Visualizations with “ggplot2”
  4:15 FIN
Wednesday   NOT MEETING
Thursday 9:00 [Re-]Introductions & Orientation
  9:15 git and More Tools in the Shell
  10:30 Coffee Break
  10:45 Basic Python
  12:15 pm Lunch Break
  1:00 Software Portals (PyPI and CRAN)
  1:30 Web Services and APIs with Python
  2:30 Stretch Break
  2:45 Social Media and other APIs
  4:15 FIN

Setup

Software

Use the default installation options for all packages. For Windows users, an installer for each item is available at the given download site. Mac users are encouraged to use Homebrew – the missing package manager for OS X – via the shell, although the downlink links also provide .dmg installers.

git
https://git-scm.com/downloads
brew install git
R
https://cran.rstudio.com/
brew install r
RStudio (free version)
https://www.rstudio.com/products/rstudio/download2/
Use the downloader.
Python 3.x
https://www.python.org/downloads/
brew install python3

The following R packages need to be installed after R and Rstudio are installed. Open RStudio and, for each package below, type install.packages(%package%) at the prompt and press return. Follow all prompts.

  • tidyr
  • dplyr
  • magrittr
  • stringr
  • ggplot2
  • data.table
  • lme4

The following Python packages need to be installed Python. Open a shell/terminal and, for each package below, run pip3 install %package%.

  • pandas
  • jupyterlab
  • beautifulsoup4
  • requests
  • census
  • ggplot

After installing jupyterlab, run jupyter serverextension enable --py jupyterlab --sys-prefix in the shell/terminal to complete installation. JupyterLab runs through your browser, to launch it, enter jupyter lab in the shell/terminal, and stop it with Ctrl-C.

Acknowledgements

Portions of the instructional materials, along with the structure of the curriculum and teaching approach, are adopted from Data Carpentry and Software Carpentry.