Cyberhelp

for Researchers & Teams

Environment Modules

Software on the SESYNC server (as of September 2021) is organized using environment modules. Modules allow the user to modify their working environment on the server by loading specific versions of software. SESYNC maintains multiple versions of commonly used software on the server so that users can load the version they want to work with. This is great for reproducibility because you can guarantee that the same version of (for example) R is used every time you run a piece of code. And even if SESYNC IT staff run updates and install a new version of R on the SESYNC server, you can still use the older version if you want to ensure that your code runs exactly the same way every time.

Creating Visualizations to Enhance Scientific Teamwork

There are a variety of visualization options available to help scientific teams create and share a collective vision of their research project. The development of these visualizations can promote deep learning and engagement by transcending verbal and written boundaries across diverse perspectives within a team. These visuals allow groups to communicate complex socio-environmental concepts among the team, scientific community and public sphere.

Software Solutions for Qualitative Data Analysis

Qualitative data analysis (QDA) is the process of searching for patterns and themes in large volumes of unstructured data to help answer relevant research questions. This research approach commonly uses computer software, known as QDA software, to search for these connections. Here are some common examples of qualitative data: Interviews Focus Groups Observations Short text surveys Video or Audio Transcripts Secondary research (e.g. social media data, journal articles, historical records)

Technological solutions to streamline your team’s virtual project

How might my team effectively use tech solutions to streamline collaborative work? We frequently get asked about the nitty gritty of virtual collaboration throughout the duration of a synthesis project. These deep dives into how your team might utilize tech solutions in different phases of your project provide a few examples. Please note that these are just examples and your mileage may vary, however, a few key points apply broadly.

Zoom for SESYNC Teams

SESYNC can offer Zoom access, customization and technical support to help you get the most out of your virtual meeting. Start a discussion with SESYNC staff by emailing cyberhelp@sesync.org.

Run Code in Parallel on a Windows Virtual Machine

If you have a program that runs only on Windows systems, and you would like to run the program in parallel, it is not possible to use SESYNC’s Slurm cluster. However, it is possible to take advantage of SESYNC’s Winanalytics virtual machine, which has multiple cores and much more available RAM than the typical laptop. You will need to write a little bit of code in PowerShell, which is Windows’ shell scripting languge and is fairly similar to Bash scripting. Here is a quick walkthrough of how to do this.

The Compute Cluster

SESYNC provides a high-performance computing cluster for memory-intensive and time-intensive computing tasks. (FAQ: What is the SESYNC cluster?) You can connect to the cluster through our ssh gateway service running at ssh.sesync.org or by submitting jobs through RStudio. The workflow for using a cluster is a little bit different from a typical run in R or Python. In addition to your processing code, you must give the cluster a list of execution instructions and a description of the resources your analysis will require. Any output or error messages from your script will be written out to a file called slurm-[jobID].out.

Command-line Stata (from RStudio Server)

To support existing data analysis pipelines that use the Stata software, SESYNC has purchased a Stata license and created a dedicated virtual machine for remote use by affiliated researchers. This quick start guide explains the essential steps for evaluating Stata commands over SSH or from SESYNC’s RStudio server.

Connect to a Database Server

This Quick Start guide will walk you through establishing a connection to a database on SESYNC’s server. Access to your pursuit’s relational database management system (RDBMS) requires communication between a server application (PostgreSQL or MySQL) and a client application (RStudio, Jupyter, psql, etc.).

Jupyter Server

SESYNC provides access to remote JupyterLab sessions via a web browser. The Jupyter Project provides an environment for Python development, and SESYNC’s Jupyter Server adds direct connections to resources like shared file storage, databases, GitLab, and a compute cluster.

Publish Synthesis Data Products

Choosing to publish your data products in a long-term repository can:

Share Files and Folders with Anyone

SESYNC researchers and staff can provide a link for external collaborators to upload (donwload) files to (from) any research data directory they can access. We recommend this mechanism for receiving datasets from external collaborators: create a new folder and turn it into a public “file drop”, as described below.

RStudio Server

SESYNC provides access to a remote RStudio session, via a web browser, in order to work in R while directly connected to other SESYNC resources (file storage, databases, the cluster, etc).

Remote Meeting Participants

When you submit your travel planner to our travel office, please make sure to include a list of all remote participants that will be joining your meeting. You may add participants after the 8 week deadline, but we require that you notify us of all participants at least one week before the start of your meeting.

Publish a Shiny App

To publish a R shiny application on the SESYNC server, your files will need to be copied from your working directory to the shiny-apps-data shared folder (/nfs/shiny-apps-data on RStudio Server). Please contact SESYNC IT staff if you would like to host an app on SESYNC’s Shiny Server for the duration of your project’s lifecycle.

Create Projects on GitLab

SESYNC offers private git hosting through our GitLab server. When you connect to our GitLab Community Edition (CE) instance using your SESYNC username and password, you’ll see a dashboard of recent activity on projects that you are part of. If this is your first time connecting, it may be a little quiet.

Bulk Uploads and Downloads by SFTP

You can upload and download data from your research data directory using an SFTP client. We recommend Cyberduck or WinSCP

Research Data Directory

SESYNC provides a large, shared file store to host data for all projects. Project participants have access to the research data directory for their project from our compute servers, a web portal, a desktop application for syncing, and SSH.

eBeam Whiteboards

SESYNC has installed the eBeam whitebaord capture software on all of our conference room PC’s and laptops.

Why is R (or Python) not found?

If you trying to run R from the terminal, and you see this scary-looking message,

How do I set up a Python virtual environment for Slurm jobs?

The purpose of a Python virtual environment is to create an isolated virtual space for your Python project. It is good to have a virtual environment because it allows you to execute code in a constant context, and each project can have its own dependencies. Any updates to Python versions or Python packages elsewhere on the system will not affect the virtual environment, ensuring that you can reproduce your results later. Currently the default Python version for new package installation on the Slurm cluster, the Jupyter server, and the RStudio server (as of September 2021) is Python 3.8.6. If you would like to run your Slurm Python jobs with other Python versions, or use a different version of Python in a .Rmd notebook on the RStudio server, a virtual environment is necessary if you want to install additional packages.

How do I change the git default branch name for new repositories from master to main?

NOTE: See the note on terminology in our basic git lesson for more background on why default repository names are changing from master to main across git platforms. Changes are ongoing across all git platforms so this FAQ may be out of date by the time you read it!

What is Slack and why is it useful for team science?

Slack is a messaging platform where project members can communicate and collaborate by sharing messages, files and tools to manage your team project effectively.

Why am I locked out of the RStudio or Jupyter server?

TL;DR: Your home directory might be over its quota. Either move data from there to your research data directory or contact SESYNC cyberhelp for assistance.

How do I create a virtual environment on the Jupyter server?

Collaborative coding can benefit from having everyone use the same computing environment, including the same versions of packages, data, and code. In Python, this can be done using virtual environments. You can create a virtual environment for each project or analysis, as long as they are in different directories. On SESYNC’s Jupyter server, it takes a little set-up to start using virtual environments.

What technological options exist for virtual team communication?

In our all-virtual work world now, it can be very useful to have multiple ways of communicating with your team. Having multiple lines of communication can provide for all people to have input in a way they are comfortable with, and foster asynchronous collaboration. It also makes the work of your team more transparent to all participants.

How do I set up an SSH key on GitLab/GitHub?

If you have just created your first project on the SESYNC GitLab server and tried to push files to it for the first time, you might see a confusing message saying that you need to generate an SSH key so that you can push updates from your local clone of the repository to the GitLab server with the SSH protocol. You might want to do this so that you never have to enter a username and password to push commits. The SSH key takes the place of the username and password, but you need to register your local key with the remote repository first.

How can I secure my virtual meeting?

Use of video conference platforms has exploded now that we are all working from home.

What are common options for Slurm jobs, and how do I set them?

There are a few different ways to run a job on SESYNC’s Slurm compute cluster, but all of them ultimately run a command called sbatch to submit the job to the cluster. The sbatch program is part of the Slurm software package and has a lot of different options. These include a maximum length of time your jobs can run, how much memory you are requesting, whether you want to be notified by email when your job finishes running, etc. It’s possible to run a Slurm job without setting any of the options and going with all defaults, but there are times when you might want to customize the options.

Cyber Resources

This infographic shows the relationships between the different cyber resources available to SESYNC users, and their intended uses.

How do I work with a git-versioned project in Jupyter Lab?

There are two ways to work with git projects in Jupyter Lab. You may either use the git extension for Jupyter Lab for a point-and-click interface, or issue git commands directly on the command line.

How do I run an interactive job on the cluster?

SESYNC’s Slurm compute cluster allows users to run big memory- and processor-intensive jobs. Many users don’t know that you can access the memory and processing power of the cluster interactively, typing commands directly into the command line or into an R or Python session. This FAQ briefly describes how to start an interactive job on the Slurm cluster.

Where should I store temporary files created by Slurm cluster jobs?

Many jobs on the Slurm compute cluster generate lots of big files that require large amounts of memory to be stored but are only needed temporarily. There are two different ways to easily store large temporary files created by cluster jobs: temporary storage on a specific node (/tmp/) and scratch space accessible from all nodes (/nfs/scratch/).

How much data can I store in my research data directory?

TL;DR: Try to have a general idea of your data storage needs, and discuss it with the data science team if you are concerned, but do not be too worried unless you are going well over 1 terabyte.

What resources exist for collaborative writing?

There are several resources available for collaborative writing depending on which platform you prefer to work. These are the resources SESYNC groups have successfully used in the past.

How to create a symlink to a research directory?

To access and see your data directory for Jupyter or RStudio, it is best practice to set a symlink, a symbolic link that points to your data directory and allows you to browse the files in that directory.

How does SESYNC wind down computational support?

SESYNC’s data storage and computational resources are available to pursuit participants for approximately one year after the final meeting to allow completion of project-related tasks.

Why isn't my research data directory in '/nfs'?

It is, or at least will be as soon as you need it! Any research data directory you have access to will be mounted to the filesystem at “/nfs” when you access it. If you have not touched any of the files in there for a while, it may have un-mounted and appear to be missing. So if you don’t see your “*-data” folder under “/nfs”, just navigate directly to the folder and it will instantly mount. For example, if your research data directory is “cooltrees-data”, then enter the full path as “/nfs/cooltrees-data” in the file browser or from the command line.

Why does my virtual machine show less memory than I requested?

SESYNC Windows client virtual machines are setup to use dynamic memory. What this means is that your virtual machine will show a different amount of memory available based on its current usage. You still have access to the full amount of memory allocated if needed. The virtual machine will grab more memory from the hypervisor when needed automatically.

Why does git show that all my files changed when I didn't change them?

Due to some quirks on our storage system your git repo may show that all of your files have modifications. If you perform a ‘git diff’ you will see a list that looks like:

What is a virtual machine?

A virtual machine is a Windows or Linux machine that runs on and shares computing resources with a physical machine known as a hypervisor. Virtual machines allow the deployment of multiple machines or services on one or several hypervisors to better utilize computing resources (CPU cores, memory, etc…)

What support does SESYNC provide for custom virtual machines?

SESYNC has the ability to deploy custom Windows and Linux Virtual Machines for use by groups. If there is a software or service needed that is not provided by our shared infrastructure, we can deploy a virtual machine to meet your needs.

When are the server maintenance windows?

When are the server maintenance windows?

What happens to my jobs during the maintenance window?

We’re sensitive to the fact that your jobs may need to run over our maintenance window and will take a reasonable effort to ensure they aren’t disrupted. In order to ensure as minimal disruption as possible, these are the steps that we take:

How do I access Linux resources?

SESYNC Linux resources are deployed on a private network at SESYNC and are accessed via our ssh gateway at ssh.sesync.org. These resources include RStudio, Jupyter lab, and our compute cluster. Please DO NOT run your computational processing on the ssh gateway, it has limited memory and processing power. Instead, use the ssh gateway to submit jobs to SESYNC’s compute cluster or to connect to your virtual machine.

How do I access my research data directory?

Navigate to https://files.sesync.org and log in with your SESYNC username and password. The folders listed under “External storages” are each a shared research data directory accessible to participants in the corresponding project.

Should I use GitHub or SESYNC's GitLab?

If you already have projects on GitHub that you are working on, we prefer that you continue to use GitHub due to its open nature. We’ll gladly push and pull code from your public repository. We provide GitLab locally for projects that are just starting up, have sensitive data, or are not quite mature enough to be pushed out into the world.

What's the difference between git, GitHub, and GitLab?

The three are often a source of confusion.

Can code move between GitLab, GitHub, and Bitbucket?

Yes! You can push a local git repository to any new remote resource. Please note that only your source code will move. However, the additional features you use (e.g. wiki, issues, etc.) will need to be manually copied.

What is the compute cluster?

SESYNC’s computational cluster (see quickstart page) enables users to run medium-to-large scale analyses by distributing multiple, independent tasks across many computers. This setup is ideal for tasks that require applying the same algorithm or a parameter set over independent units in a large data set.

How do I access my Windows virtual machine?

SESYNC provides remote access to all desktop resources through a browser based Remote Desktop Protocol (RDP). Browse to https://desktop.sesync.org and login with your SESYNC username and password. Select one of the virtual machines to connect to its desktop (only machines you have permission to access are shown).

Do I have to use the cluster?

We highly recommend using the scheduled cluster for running all of your CPU-intensive or long running programs. Below is SESYNC policy for long running processes on our different types of resources:

How do I create an RStudio project with git?

RStudio projects are folders that contain project files and a special .Rproj file. To link an RStudio project with a git repository, follow these steps:

How do I change my SESYNC password?

Point your web browser to https://pwm.sesync.org.

What is my SESYNC username?

A SESYNC username is usually your first initial followed by your last name, (i.e. “John Smith” is jsmith). Common or very long names may not follow this pattern.

What resources are available?

SESYNC has an extensive set of computing resources and expertise available for researchers. Download a high-level overview of all services and support SESYNC offers for general information, or scan the tables below for a quick reference. Direct all questions to cyberhelp@sesync.org.

How do I contact IT or research support staff?

Email cyberhelp@sesync.org with your question or support request.

Open data during a pandemic: ESA 2021 Ignite session

The global pandemic increased the urgency of making data open and accessible

Summer Institute 2021: a year older and wiser

A few weeks ago, 30 researchers from 11 teams made their virtual way to SESYNC for the 2021 Computational Summer Institute. It was a pleasure to interact with researchers from diverse backgrounds working on an array of interesting questions in socio-environmental synthesis. In this blog post, I’ll go over some of the highlights of the week and the lessons we as instructors learned.

Making a fifty-state USA map, 2021 edition

tl;dr: This post walks you through making a map of the fifty United States with ggplot2, where Alaska and Hawaii are moved from their true geographic locations.

Making free maps with R, ggspatial, and Mapbox

These days, it seems like every time you turn around a new R package for making maps comes out, which renders an older one obsolete … it’s hard to keep track of! There are tons of competing alternatives for both traditional (static) maps and interactive maps. Tools like leaflet and mapview are great for interactive maps, but this post focuses on a classic: a static map. That’s right, no zooming in and out or panning back and forth. Just a nice clean simple map.

Geocoding with R

Data is not perfect. We all know that. A little while ago I stumbled onto an Annotated Honey Bee Images dataset from Kaggle and decided to map it, except I couldn’t map it right away. The dataset included text for the city names where the images were collected, but not the latitude and longitude coordinates needed to map the locations. I decided to do some geocoding to get the coordinates for each location to map the bees!

Data exploration to cultivate better living at the 2021 UMD Data Challenge

Today’s post here on The CyBlog is a guest post by Allie Cahanin and Katherine Toren, Grand Prize winners of the 2021 UMD Data Challenge.

Goodbye %>%, hello := (Using R data.table to speed up my data science)

This is a little story about how I learned to stop worrying and love data.table, a great (and in my opinion underrated) package for doing data science in R.

How open reproducible methods benefit the research community: a shiny story

Following up on Kelly H’s recent excellent blog posts on accessibility in Shiny apps, I’d like to tell a little story that illustrates how R helps make open science and reproducibility possible. After all, accessibility also includes making it possible for other community members to use and benefit from work you’ve done. We had a problem which was solved with the help of the R community, and I was able to get more bang for my buck: the work I did is now part of a package that anyone can access. That’s more efficient and speeds the pace of research! This is only possible with the community of great people that work on R — they are often willing to donate their time free of charge to help other people solve problems.

The carbon footprint of R code, and how to reduce it

Carbon footprint. For many of us, that term evokes cars belching exhaust and cows belching methane as they wait to be turned into hamburgers. But the carbon footprint of our digital infrastructure is enormous too! Data centers used approximately 1% of all electricity worldwide in 2018 and almost 2% of electricity in the U.S. There are efforts underway, including at the University of Maryland, to increase the use of renewable energy to power data centers, to recycle waste heat for beneficial uses, and to cool data centers more efficiently. Even so, power use by data centers is forecast to rise.

Shiny App Accessibility, Part 2: Accessible Design

Go back to part 1 of our series on Shiny App Accessibility

Shiny App Accessibility, Part 1: Only You Can Prevent Link Rot

Continue onto the next post for part 2 of our series on Shiny App Accessibility

Resources to help you learn GitHub Pages

The SESYNC Cyber Team has compiled some resources, including tutorials and examples, on how to use GitHub Pages. Most of them are based in Markdown and Jekyll. Markdown is a “lightweight markup language,” meaning a way to write a text document with minimal formatting codes that can be rendered into a document such as a webpage. Jekyll is a “gem” written in the Ruby language (to be cute, but confusing, they call packages in Ruby “gems”) that turns documents written in Markdown into (static) HTML sites with nice layouts. It isn’t necessary to use Markdown and Jekyll to use GitHub Pages, but Jekyll has built-in support for GitHub Pages so everything integrates pretty well. This means GitHub takes care of converting your human-readable files (like Markdown) to HTML, including all of the relative link paths to navigate your website based on configuration files. Hugo is an alternative to Jekyll.

What we learned teaching a virtual course this summer

Over the last few months, many people have conducted instructional courses virtually and shared their experiences online. In that respect, our reflections below on what we learned conducting a virtual course this summer are not ground-breaking. We provide them for those who attended the course, for our future selves, and anyone else interested in our specific instructional circumstances.

Tips for a smooth R(Studio) workflow and reproducible R code

A lot of people at SESYNC use R (often through RStudio on rstudio.sesync.org), are interested in making their research as reproducible as possible, and want to save time and make life easier for themselves. That’s why I wrote this blog post with some ideas for how you can make your R workflow smoother, easier for you and anyone you ask to help you, and more in line with reproducible science best practices!

Best Practices for SESYNC Virtual Meetings

Yet another set of recommendations for how to transition to remote work!? Yes! However, this post presents a curated list of strategies specifically to help leaders of synthesis teams who are faced with the prospect of holding one or more Pursuit meetings online rather than at our center in Annapolis, Maryland. Our model thus far has emphasized the value of those in-person meetings; we even designed the physical layout to maximize interaction! However, like you, we’re now adapting. We will continue to provide online computing environments and support for your collaborations, and support new meeting formats in order to help you build or maintain team collaborations and keep forward momentum on research.

Oh, the Places You Can Get Census Population Data For!

Every 10 years, the U.S. Census Bureau conducts a nationwide survey to count the number of people in the nation, which is known as The Decennial Census. Although seemingly a straightforward concept, using these data to appropriately quantify patterns or trends1 for any given location within the country may require getting acquainted with some nuanced jargon. This post is to introduce some concepts to help you get started. e.g. considering effects of the modifiable areal unit problem ↩

Google Dataset Search: A very helpful and definitely not evil tool for finding data

Even though Google has attracted its fair share of controversy, I have to admit Google’s got to where they are because their tools are pretty good. Recently I stumbled across another of their tools I’m finding really useful: the Google Dataset Search.

Databases, huh? What are they good for?

Synthesis research involves assembling multiple data sets from different sources. Integrating those data into a format that facilitates exploration, visualization, and eventual analysis is often the most time-consuming and tedious part of the research process—however, careful attention and a little bit of strategy at early stages can pay huge dividends later on.

Using the rslurm package to run code in parallel

This blog post will walk you through a quick example of how to use the rslurm package to parallelize your code.

ggplot tricks not to forget about

Tweaking figures for presentations or publications can be a tedious process, especially when I always need a reminder on “how to use greek letters or subscripts in y-axis”, “remove legend”, and “r pch”. Here are a collection of some ggplot2 functions and arguments that I find particularly useful and want to remember.

Creating visualizations with DiagrammeR

Have you ever needed to create a visualization of a research process or statistical model that isn’t directly plotted from data? For example, a conceptual diagram, mind map, flowchart of your research process, or statistical model diagram. The R package DiagrammeR makes it much easier to create high quality figures and diagrams in situations like these.

Publishing Data Papers

Alongside sharing and publishing data sets, there are a variety of ways to publish accompanying journal articles to provide a “data description” that either includes or refers to a specific dataset. This is a way to offer narrative context beyond standard metadata, such as describing the motivation and process behind compiling the dataset being described. Additionally, this type of publication can offer formal recognition for all team members involved in creation of the dataset.

Sharing your RShiny App

RShiny and related packages have lowered the bar for making web applications in R without requring knowledge of the languages of web browsers (CSS, Javascript, HTML). This also means that sharing your app usually requires finding a platform that can run R code. Here are some (non-mutually exclusive!) options to consider for making your Shiny apps available on the web.

Adventures in Windows Dynamic Memory

SESYNC’s Windows virtual machines are setup to use dynamic memory. What this means is that your virtual machine will show different memory usage based on its current usage, however, you will still have access to the full amount we allocated to you.

Raster Change Detection Analysis with Two Images

Raster Change analysis with Two dates: Hurricane Rita

Making "dataspice" at #runconf18

As a perk of being an rOpenSci fellow, I recently got to attend the organization’s 5th ‘unconference’. This meeting brought together around 60 R users from around the world to spend a few days cooking up some new tools for the R community based on ideas discussed online leading up to the event.

Build a Shiny App to Browse MODIS Data

In preparation for our recent geospatial short course, I spent some time getting up to date on the new features in the leaflet R package. There are so many possibilities between the new add-ons in “base” leaflet, like inset mini maps and measuring tools, and even more functionality being added all the time in leaflet.extras, mapedit, and mapview.

Standardizing Non-standard Evaluation in R

Partway through her LTER Postdoc at SESYNC, ecologist Meghan Avolio ran into trouble manipulating her data on plant communities with dplyr functions. I had encouraged Meghan to modularize her scripts by writing functions for common steps in her pipeline (such as converting count data into rank-abundance curves). “You’ll love writing functions!” I said wrongly.

Writing Data Management Plans

Many funding agencies require proposals to include a section addressing plans for data management. This includes how you will handle data as it is being collected during the project, as well as plans for sharing and archiving once the project is complete. Here is a collection of resources we’ve found helpful for writing DMPs:

Images for Data Exploration in RShiny Apps

Photos, as a source of data, or to aid in the interpretation of data, can be a useful addition to RShiny applications. Here are two examples of using photo data: one that displays images from URLs, and another that uses species names to find pictures of animals.

Tidy Data in Python

Get your data in shape with pandas.

Plots in R

Craft publication-quality graphics with ggplot2.

Model Formulas

Write formulas for linear and GLM regression models in R

Text Mine

Carve your texts into structured data.

Data APIs in R

Acquire data from websites and APIs using R

Data APIs in Python

Acquire data from websites and APIs using Python.

RMarkdown

Extend your data pipeline with RMarkdown and Shiny.

Leaflet Maps

Make interactive maps in R using the leaflet package.

Shiny Apps

Get interactive with the Shiny R package.

Git in the Shell

Perform version control from the command line.

Advanced git

Learn advanced git techniques with GitHub and RStudio

Advanced Tidyverse

Use piped workflows for efficient data cleaning and visualization.

Spatial R Packages

Manipulate geospatial data with open source tools.

Vector Operations in R

Manipulate vector data.

Raster Operations in R

Efficiently analyze raster upon raster.

Open Source Geospatial tools

Meet the open source stack underlying geospatial data.

Raster Classification in R

Classify your remotely-sensed data.

OPeNDAP with Python

Access Land Data Assimilation System models with OPeNDAP.

Relational Databases using SQLite

Leverage relational models for organizing and querying data

Relational Databases

Make your data safe, scalable and relational.

Basic NetLogo

Build agent-based models with a simple graphical interface.

NetLogo Scripting

Implement open agent-based models.

Spatial NetLogo

Use spatial data in NetLogo ABMs.

Data Documentation & Publishing

Package your data and metadata for publication.

Basic R

Start learning R in RStudio.

Basic Python

Start learning Python with Pandas and Scikit-learn.

Basic git

Learn to use git with GitHub in RStudio.

Basic SQL

Speak to a database in its native language.

Tidy Data in R

Get your data in shape with tidyr & dplyr.

Maps with R

Tour R packages that make static and interactive maps.