Table of Contents

General Information

The 2020 Summer Institute brings together ten science teams for a short course on data and software skills in socio-environmental synthesis. Through hands-on tutorials and project consultation, SESYNC staff will aim to accelerate your team’s adoption of cyber resources in all phases of data-driven research and dissemination.

Participants should expect to:

  • learn new scientific computing skills
  • overcome specific or conceptual project hurdles
  • gain coding confidence
  • have fun

Instructors:

  • Rachael Blake, Data Scientist
  • Mary Glover, Instructor
  • Kelly Hondula, Quantitative Researcher
  • Quentin Read, Data Scientist

When:

Tuesday, July 21, 2020 to Friday, July 24, 2020

Optional day for basic R training: Monday, July 20

Contact:

Please email rblake@sesync.org with any questions or for information not covered here.

Requirements

  • Participants must have use of a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) with a full-featured browser (not Microsft Edge).
  • At least one team member must prepare and have available data for the mini-project; a sample/incomplete data is okay.
  • Participants must be able to access lesson materials via the internet, and video or voice conference for coaching sessions.

Schedule

Note: Times displayed are in US Eastern Daylight Savings Time.

[Monday]   Introduction to R and the RStudio IDE    
    Block Programming Exercise    
    Base R   R
    Visualizing Tabular Data   R > ggplot2
    Pair-coding Exercise & Scripting Exercise    
    Extra Tips for learning R    
    Optional lesson: Basic Python    
  09:00 Office Hours (3 hours)    
  13:00 Office Hours (3 hours)    
Tuesday   Collaborative & Reproducible Research   git
    Manipulating Tabular Data   R > dplyr
    Exercises for Lesson 3 & Lesson 4    
    Optional lesson: Tabular Data in Python    
  11:00 Welcome and Overview of SESYNC Jon K  
  11:15 Introduce Coaches    
  11:30 Meet the Teams    
  12:15 About Homework Exercises & GitHub    
  13:00 Exercise Review & Office Hours (2 hours)    
    Coaching (time TBD with your coach)    
Wednesday   Structure for Unstructured Data   R > tidytext
    Online Data in R   R > rvest
    Optional lesson: Online Data in Python    
    Exercises for Lesson 5 & Lesson 6    
  11:00 Exercise Review & Office Hours    
  13:00 Project Updates & Discussion (1 hour)    
    Coaching (time TBD with your coach)    
Thursday   Geospatial Data   R > sf, R > raster
    Interactive Web Applications   R > shiny
    Exercises for Lesson 7 & Lesson 8    
    Optional lesson: Raster Classification in R    
  11:00 Exercise Review & Office Hours (2 hours)    
    Coaching (time TBD with your coach)    
Friday   Database Principles   SQL, R > RSQLite
    Documenting and Publishing Data   R > dataspice
    Exercises for Lesson 10    
  10:00 Office Hours & Course Wrap-up    
  11:00 Team Presentations (3 x 10 min)    
  11:30 Break    
  11:45 Team Presentations (4 x 10 min)    
  12:30 Break    
  13:00 Team Presentations (3 x 10 min)    

Participation Guide

The structure of this year’s virtual Summer Institute will be a mixture of self-paced asynchronous lessons with instructor-led synchronous sessions. To get the most out of each day, you should:

  1. Thoroughly review and code along with the lesson material and exercises for each day prior to that day’s Office Hours and Exercise Review.
  2. Call in to the Zoom meeting for daily Office Hours and Exercise Review to ask questions you have on that day’s lesson material. Note: Optional lessons will not be reviewed. The links to these lessons are only provided for your personal learning and development.
  3. Participate in coaching sessions for your team mini-project and apply what you learn in the lessons to your project.
  4. Ask questions of your instructors, your team, and fellow participants on Slack. Feel free to do this when you’re reviewing the lessons, or when you’re thinking about your mini-project.

FAQ

When are team coaching sessions?

Coaching session times will vary by team because each coach will be working with 2-3 teams during the institute. Teams and coaches should arrange times for video calls that are amenable to each other’s schedules, needs, and timezones by communicating in your team’s Slack channel. Coaching sessions should generally be held between 8am to 6pm EDT.

How will I collaborate with my team on our mini-project?

How your team decides to work is up to you and your team. You will be able to chat and hold group video calls with screen-sharing through Slack at any time. We encourage you to spend roughly 3 hours each day on your project, however this may vary throughout the week. Coaches can help you strategize how to collaborate and make best use of your time.

What should I expect for attending Day 0?

The optional first day will operate similar to the rest of the week. Lesson material is linked in the schedule and some coding demonstration videos will be sent out ahead of time to supplement those lessons. On Day 0, instructors will hold live 3 hour Office Hours sessions via Zoom at both 9:00 EDT and 13:00 EDT where we will go over exercises and answer questions about that day’s lesson material. You should review the lesson material prior to attending. Choose whichever time to attend Office Hours that works best for you.

Software

Communication Software

To hold this course virtually, we will be using a few software platforms for communication and collaboration. Communication with instructors, and within your team will be via Slack. You can text chat, as well as video call with screen sharing. Office hours and exercise reviews will be held via Zoom. This will allow voice and video conference with screen sharing and recording. Please download these two software (free) platforms to the computer you will be using before the workshop.

Software Download Site
Slack https://slack.com/downloads/
Zoom http://zoom.com/download/

If you are not familiar with joining virtual meetings in Zoom, please see the Zoom help pages for more info.

Computational Software

We use RStudio and Jupyter in this course, as well as many packages and dependencies associated with these two Integrated Development Environments (IDEs). SESYNC provides a cloud platform capable of supporting the software needs for the course, so there is no computational software for you to install in advance. However, you are welcome to install any and or all of this software—it is all free and open source—on your own machines. Feel free to request assistance any time during the course with installing the listed software on your laptop.

The table and lists below should help you find the right way to install the software, depending on your operating system. Both Windows and macOS users can install from the listed “Download Site”, or by following instructions given there. Linux (and optionally macOS) users should use a package manager—your Linux distro’s native one, or Homebrew on macOS—where possible. The GDAL/OGR downloads are not essential for using spatial libraries with R installed through the given download site.

Software Download Site Homebrew Package(s) Aptitude Package(s)
git https://git-scm.com/downloads git git
R https://cran.rstudio.com/ r r-base
RStudio https://www.rstudio.com/products/rstudio/download2/    
Python 3.x https://www.python.org/downloads/ python3 python3
Jupyter Lab http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html    
GDAL/OGR https://trac.osgeo.org/osgeo4w/ gdal, geos gdal-bin1

1: Ubuntu users will need to add the UbuntuGIS repository prior to running apt-get install gdal-bin

The following R packages need to be installed. Open RStudio and, for each package below, type install.packages(%package%) at the prompt and press return. Follow all prompts.

  • tidyr
  • ggplot2
  • dplyr
  • raster
  • sf
  • sp
  • shiny
  • leaflet
  • rmarkdown
  • lme4
  • rstanarm
  • data.table

The following Python packaged need to be installed. From a command prompt, type pip3 install %package% and press return. Follow all prompts.

  • jupyterlab
  • numpy
  • scipy
  • pandas
  • beautifulsoup4
  • census
  • lxml
  • requests
  • sqlalchemy
  • scikit-learn
  • mlxtend
  • seaborn

Acknowledgments

Portions of the instructional materials and our pedagogy are adopted from The Carpentries.