Table of Contents

General Information

A SESYNC Data Skills Workshop provides researchers from the socio-environmental synthesis community with hands-on training in open source tools for collaborative coding, data management, analysis, visualization, and dissemination. The goal of this one-day workshop is to introduce novice and intermediate scientific coders to concepts, skills and approaches for data-driven research, while relying on tools available through the RStudio development environment. The agenda provides an overview of the specific topics we will address through a series of four lessons that integrate live-coding and trainee challenge exercises. Registration is open to graduate students and researchers working in the biological sciences at the University of Maryland’s College of Computer, Mathematical and Natural Sciences (CMNS).

SESYNC Instructors:
Ian Carroll, Data Scientist
Mary Shelley, Associate Director of Synthesis
Mike Smorul, Associate Director of Cyberinfrastructure

When:
Thursday, Aug 25, 2016

Where:
Bioscience Research Building Room 1103
4066 Campus Drive
College Park, MD 20742

Get directions with OpenStreetMap or Google Maps.

Requirements:
Participants must bring a laptop with a Mac, Linux, or Windows operating sytem (not a tablet, Chromebook, etc.). The software noted below must be installed prior to starting the workshop. Contact icarroll@sesync.org with installation questions at least two business days before the event.

Contact:
Please email icarroll@sesync.org with any questions or for information not covered here.

Schedule

9:00 am Welcome and Overview of SESYNC
9:15 Reproducible Workflows in RStudio
10:30 Break
10:45 Manipulating Tabular Data
12:30 Lunch Break
1:30 pm Introduction to ggplot
3:00 Break
3:15 Databases to Documents with rmarkdown
4:45 Wrap-Up & Review
5:00 Fin

Pre-Arrival Installations & Downloads

To participate, you will need working copies of the software described below. Please make sure to install and/or download everything before the start of the short course.

GitHub

If you do not aleady have a GitHub account, please create one at https://www.github.com. Note that students and educators with a .edu e-mail address are eligible for some free stuff through GitHub’s Student Developer Pack.

Software

The table below lists software we will use in this short course. Unless noted (and especially for git) please use the default installation options. For Windows users, an installer for each item is available at the given download site. Mac users are encouraged to use Homebrew – the missing package manager for OS X – via the shell. Most packages in the list below can be installed by typing brew install %package% in the Terminal and pressing return, but packages with an * require brew cask install %package%. Ubuntu users may install from the shell with sudo apt-get install %package%, and other Linux users are on their own.

Software Download Site Homebrew Package(s) Aptitude Package(s)
git https://git-scm.com/downloads git git
R https://cran.rstudio.com/ r r-base
RStudio https://www.rstudio.com/products/rstudio/download2/ rstudio*  

R Packages

The following R packages (i.e. add-on pieces of software) need to be installed. Open RStudio and, for each package listed below, type install.packages("%package%") in the Console (where you see a >) and press return. To install the tidyr package, for example, you type install.packages("tidyr"), and then follow the instructions given.

  • tidyr
  • ggplot2
  • RSQLite
  • rmarkdown
  • stargazer
  • gridExtra

Acknowledgements & Support

Portions of the instructional materials are adopted from Data Carpentry and Software Carpentry. The structure of the curriculum as well as the teaching style are informed by Software Carpentry.