Table of Contents

General Information

The Summer Institute of 2019 brings together ten science teams for a short course on data and software skills in socio-environmental synthesis. Through hands-on tutorial and project consultation, SESYNC staff will aim to accelerate your team’s adoption of cyber resources in all phases of data-driven research and dissemination.

Participants should expect to:

  • learn new scientific computing skills
  • overcome specific or conceptual project hurdles
  • gain coding confidence
  • have fun

Instructors:

  • Ian Carroll, Data Scientist
  • Rachael Blake, Data Scientist
  • Benoit Parmentier, Data Scientist
  • Kelly Hondula, rOpenSci Fellow

When:

Tuesday, July 23, 2019 to Friday, July 26, 2019

Optional day for basic R training: Monday, July 22

Where:

1 Park Place, Suite 300
Annapolis, MD 21401

Get directions with OpenStreetMap or Google Maps.

Contact:

Please email icarroll@sesync.org with any questions or for information not covered here.

Requirements

  • Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) with a full-featured browser (not Microsft Edge).
  • At least one team member must bring data for the mini-project; a sample/incomplete data is okay.
  • After the course, participants must complete a reimbursement form to recover allowed travel expenses.

Schedule

Nourishment will arrive at the 10:30 am break, the on-site lunch provided by SESYNC at 12:30 pm, and afternoon snacks. Participants are responsible for their own breakfast and dinner arrangements (we can make recommendations).

[Monday] 9:00 Introduction to the RStudio IDE    
  9:15 Pseudo-coding Exercise    
  9:45 Base R Ian R
  10:30 Coffee + Tea Break    
  10:45 Base R (continued) Ian R
  12:00 Pair-coding Exercise    
  12:30 Lunch    
  13:30 Visualizing Tabular Data Rachael R > ggplot2
  15:30 Snack Break    
  15:45 Scripting Exercise    
Tuesday 9:00 Welcome and Overview of SESYNC Jon  
  9:15 Collaborative & Reproducible Research Ian git, GitHub
  10:30 Coffee + Tea Break    
  10:45 Introduce Coaches & data2doc    
  11:45 Meet the Teams    
  12:30 Lunch    
Blue Room 13:30 Manipulating Tabular Data (R) Kelly R > dplyr
Green Room 13:30 Manipulating Tabular Data (Python) Benoit Python > pandas
  15:15 About Homework & GitHub    
  15:30 Snack Break    
  15:45 data2doc    
  17:00 Reception (with tasty beverages, etc.)    
  Homework Lesson 3 & Lesson 4 (R)/5 (Python) Exercises    
Wednesday 9:00 Exercise Review    
  9:15 Regression Ian R > nlme
  10:30 Coffee + Tea Break    
  10:45 Smart and Interactive Documents Kelly R > rmarkdown, R > shiny
  12:30 Lunch    
  13:30 data2doc    
  15:15 Mini-project Updates & Discussion    
  15:30 Snack Break    
  15:45 data2doc    
  Homework Lesson 6 & Lesson 7 Exercises    
Thursday 9:00 Exercise Review    
  9:15 Online Data Ian Python > requests
  10:30 Coffee + Tea Break    
  10:45 Geospatial Data Benoit R > sf, R > raster
  12:30 Lunch    
  13:30 data2doc    
  15:15 Mini-project Updates & Discussion    
  15:30 Snack Break    
  15:45 data2doc    
  Homework Lesson 8 & Lesson 9 Exercises    
Friday 9:00 Exercise Review    
Blue Room 9:15 Structure for Unstructured Data Ian  
Green Room 9:15 Relational Databases Q&A Kelly SQL, R > dbplyr
  10:30 Coffee + Tea Break    
  10:45 Documenting and Publishing Data Rachael R > dataspice
  12:30 Lunch + data2doc    
  14:30 Team Presentations (5 x 10 min)    
  15:30 Snack Break    
  15:45 Team Presentations (5 x 10 min)    
  Homework Lesson 11 Exercises    

Software

The workshop uses RStudio and Jupyter, as well as many packages and dependencies associated with these two Integrated Development Environments (IDEs). SESYNC provides a cloud platform capable of supporting the software needs for the short course, so there is nothing for you to install in advance. During and after the course, you will be able to install any and or all of this software—it is all free and open source—on your own machines. Feel free to request assistance any time during the course with installing the listed software on your laptop.

The table and lists below should help you find the right way to install the software, depending on your operating system. Both Windows and macOS users can install from the listed “Download Site”, or by following instructions given there. Linux (and optionally macOS) users should use a package manager—your Linux distro’s native one, or Homebrew on macOS—where possible. The GDAL/OGR downloads are not essential for using spatial libraries with R installed through the given download site.

Software Download Site Homebrew Package(s) Aptitude Package(s)
git https://git-scm.com/downloads git git
R https://cran.rstudio.com/ r r-base
RStudio https://www.rstudio.com/products/rstudio/download2/    
Python 3.x https://www.python.org/downloads/ python3 python3
Jupyter Lab http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html    
GDAL/OGR https://trac.osgeo.org/osgeo4w/ gdal, geos gdal-bin1

1: Ubuntu users will need to add the UbuntuGIS repository prior to running apt-get install gdal-bin

The following R packages need to be installed. Open RStudio and, for each package below, type install.packages(%package%) at the prompt and press return. Follow all prompts.

  • tidyr
  • ggplot2
  • dplyr
  • raster
  • sf
  • sp
  • shiny
  • leaflet
  • rmarkdown
  • lme4
  • rstanarm
  • data.table

The following Python packaged need to be installed. From a command prompt, type pip3 install %package% and press return. Follow all prompts.

  • jupyterlab
  • numpy
  • scipy
  • pandas
  • beautifulsoup4
  • census
  • lxml
  • requests
  • sqlalchemy
  • scikit-learn
  • mlxtend
  • seaborn

Acknowledgments

Portions of the instructional materials and our pedagogy are adopted from The Carpentries.