Maps in R

Lesson 7 with Philippe Marchand

Top of Section


Over time, the R community has produced a collection of spatial analysis and visualization packages, giving current R users the ability to implement various tasks that previously required to specialized Geographic Information System (GIS) software.

This lesson provides a quick tour of popular tools available to read and display spatial data in R. These can be useful whether you want to quickly explore a dataset or to produce high-quality maps for print and web publications. Like for other types of figures, building maps with scripts can take some time at first, but that investment will quickly pay off in terms of reproducibility and re-use value.

Spatial data types

All types of spatial data belong in one of two main categories: vectors (points, lines and polygons) or rasters (grids of pixels).

Raster Vector
matrix table
surface shape
pixels features
bands attributes

Some raster file formats: GeoTIFF, netCDF

Some vector file formats: shapefiles, WKT, GeoJSON

Top of Section


Reading shapefiles into R

Vector data in shapefiles can be read via the readOGR function in rgdal, which is a package linking R to the open source Geospatial Data Abstraction Library (GDAL).

Here, we read a shapefile corresponding to county boundaries for the state of Maryland. This data was extracted from the full county boundaries shapefile available on the US census website.

library(rgdal)
counties_md <- readOGR("data/cb_500k_maryland", 
                       "cb_500k_maryland")

Note that the first argument to readOGR is the path of the shapefile whereas the second argument is the layer name; for simple shapefiles, it is usually identical to the file name.

The shapefile is read as a SpatialPolygonsDataFrame object, which contains both a list of 24 county (multi)polygons (counties_md@polygons), as well as a data frame (counties_md@data) where each row stores the attributes of the corresponding county.

summary(counties_md)
Object of class SpatialPolygonsDataFrame
Coordinates:
        min       max
x -79.48765 -75.04894
y  37.91172  39.72304
Is projected: FALSE 
proj4string :
[+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0]
Data attributes:
 STATEFP    COUNTYFP      COUNTYNS            AFFGEOID      GEOID   
 24:24   001    : 1   00592947: 1   0500000US24001: 1   24001  : 1  
         003    : 1   00593907: 1   0500000US24003: 1   24003  : 1  
         005    : 1   00595737: 1   0500000US24005: 1   24005  : 1  
         009    : 1   00596089: 1   0500000US24009: 1   24009  : 1  
         011    : 1   00596115: 1   0500000US24011: 1   24011  : 1  
         013    : 1   00596495: 1   0500000US24013: 1   24013  : 1  
         (Other):18   (Other) :18   (Other)       :18   (Other):18  
           NAME    LSAD        ALAND               AWATER         
 Baltimore   : 2   06:23   Min.   :2.096e+08   Min.   :6.346e+06  
 Allegany    : 1   25: 1   1st Qu.:8.279e+08   1st Qu.:2.414e+07  
 Anne Arundel: 1           Median :1.087e+09   Median :2.007e+08  
 Calvert     : 1           Mean   :1.048e+09   Mean   :2.910e+08  
 Caroline    : 1           3rd Qu.:1.222e+09   3rd Qu.:4.558e+08  
 Carroll     : 1           Max.   :1.711e+09   Max.   :1.145e+09  
 (Other)     :17                                                  

Top of Section


Basic spatial plots

To display the spatial features (points, lines or polygons) in a Spatial object, you can simply use the plot function.

plot(counties_md)

plot of chunk shp_plot

To show how multiple spatial layers can be superposed in the same plot, we will create a new SpatialPolygonsDataFrame with a single county selected by name. Note that we can subset counties_md just like a regular data frame.

To add a new layer to an existing plot, use the add = TRUE argument.

howard <- counties_md[counties_md[["NAME"]] == "Howard", ]
plot(counties_md)
plot(howard, col = "red", add = TRUE)

plot of chunk plot_add

Other base R functions, such as text, always add elements to an existing plot. In the code below, note that calling coordinates on a polygon object returns the center point of each polygon.

plot(counties_md)
plot(howard, col = "red", add = TRUE)
text(coordinates(counties_md), 
     labels = counties_md[["NAME"]],
     cex = 0.7)

plot of chunk plot_text

Exercise

Starting from a fresh map, print numbers on each county in order of the smallest (1) to largest (24) in land area (“ALAND” attribute). Hint: Use rank(x) to get ranks from a numeric vector x.

Top of Section


Reading rasters into R

While vector spatial layers are composed of geometrical objects defined by their vertices, raster layers are grids of pixels with attached values.

We start by loading the raster package in R and importing a raster file with the eponymous raster function. This file is a portion of the National Land Cover Database, which we already cropped and reduced to a lower resolution in order to speed up processing time for this tutorial.

library(raster)
nlcd <- raster("data/nlcd_agg.grd")

A raster is fundamentally a data matrix with associated spatial properties (e.g. extent, resolution and projection) that allow its values to be mapped onto geographical space. You can view these key properties – the raster’s metadata – by typing the object name in the console.

nlcd
class       : RasterLayer 
dimensions  : 2514, 3004, 7552056  (nrow, ncol, ncell)
resolution  : 150, 150  (x, y)
extent      : 1394535, 1845135, 1724415, 2101515  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
data source : /nfs/pmarchand-data/Training/maps-in-R-lesson/_slides/data/nlcd_agg.grd 
names       : nlcd_2011_landcover_2011_edition_2014_03_31 
values      : 0, 95  (min, max)
attributes  :
        ID      COUNT Red Green Blue Land.Cover.Class Opacity
 from:   0 7854240512   0     0    0     Unclassified       1
 to  : 255          0   0     0    0                        0

The values in raster cells range from 0 to 255, yet land cover is a categorical variable. The mapping of numbers to categories can be found in the raster’s attribute table:

attr_table <- nlcd@data@attributes[[1]]

Each land cover category has a distinct color specified by the “Red”, “Green” and “Blue” columns.

We can visualize the whole raster with plot.

plot(nlcd)

plot of chunk raster_plot

At this point, we might want to superpose the county boundaries on top of the raster image. However, the following instruction does not show the county polygons. Why not?

plot(counties_md, add = TRUE)

Top of Section


Reading and changing projections

In order for two spatial objects to be comparable, they must be represented in the same coordinate reference system. This can be a geographic coordinate system or a projected coordinate system.

A geographic coordinate system (GCS) represents each point on Earth by two angular coordinates, a longitude and a latitude. In effect, it approximates the irregular “sea level” surface of the Earth (the geoid) by an ellipsoid (a sphere flattened at the poles). The specification of this ellipsoid, or datum, has changed over times; current global maps are generally based on the WGS 84 datum, but other systems are in use for older or regional maps.

Geoid approximation

A projected coordinated system associates angular coordinates to points on a plane, in order to produce two-dimensional maps. No such projection can faithfully reproduce the three-dimensional relationship between any set of points on the globe. For example, the Mercator projection (left) preserves angles, which is useful for navigation, but it greatly inflates areas at the poles. The Lambert equal-area projection (right) preserves areas at the cost of shape distortion away from its center.

Projections

Source

Web-based tiled map services like Google Maps use a simplified, less accurate Mercator projection (Web Mercator) to optimize rendering speed.

Projection in R

Coordinate systems in R are described in a standard PROJ.4 string format.

The counties_md polygons layer uses a geographic coordinate system (“+proj=longlat”) based on the NAD83 datum, whereas the nlcd has an Albers equal-area projection (“+proj=aea”).

proj4string(counties_md)
[1] "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(nlcd)
[1] "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
Question
Why is the equal-area projection useful for a land cover dataset?
Answer
Because each cell has the same area, the proportion of cells with a given land cover class is a good estimate of the fraction of total land area covered by that class

Returning to our original problem, we can superpose the two layers by first converting counties_md to the same projection as nlcd. The spTransform function performs this operation.

counties_proj <- spTransform(counties_md, 
                             proj4string(nlcd))
plot(nlcd)
plot(counties_proj, add = TRUE)

plot of chunk sptransform

Top of Section


Masking a raster

To conclude this presentation of rasters, we introduce the mask function. Masking a raster means removing some of its data (i.e. setting it to NA) based on a logical condition.

For example, the masking layer nlcd == 81 returns TRUE for cells with land cover class 81 (pasture) and FALSE otherwise. With the setting maskvalue = FALSE, the code chunk below masks any non-pasture cells and plots the resulting raster.

pasture <- mask(nlcd, nlcd == 81, maskvalue = FALSE)
plot(pasture)

plot of chunk mask

Exercise

Create and plot a mask for a different type of land cover.

You may consult attr_table to find the numeric ID matching a specific land cover class.

Top of Section


Adding data to maps

The tmap (“thematic map”) package simplifies map-based visualization of data attributes associated with a given shapefile. Its logic and syntax are particularly easy to learn for those users already familiar with the popular graphics package ggplot2.

Quick mapping with qtmap

Like the qplot function in ggplot2, qtmap serves to make a quick, less customized map from a single line of code. Its first argument is a spatial object forming the base of the map.

library(tmap)
qtm(counties_proj)

plot of chunk qtmap

If the shapefile has associated data, its column names (as quoted strings) can be mapped to graphical elements of the plot. In the example below, the “fill” color of counties depends on their water area, and county names are shown at the center of each polygon.

qtm(counties_proj, fill = "AWATER", text = "NAME")

plot of chunk qtmap_data

Layered thematic maps (ggplot style)

For more detailed maps, you can add multiple layers defined by tm_ functions. In the code below, tm_shape sets the spatial object from which the successive layers are derived; tm_borders adds the polygon outlines; tm_fill and tm_text define the same graphical elements as in our previous map, with additional arguments specifying the legend title and text size.

map1 <- tm_shape(counties_proj) +
            tm_borders() +
            tm_fill("AWATER", title = "Water Area (sq. m)") +
            tm_text("NAME", size = 0.7)
map1

plot of chunk tmap_show

Since we saved our thematic map in a variable map1, we can recall it and add more map elements, as shown below.

map1 +
    tm_style_classic(legend.frame = TRUE) +
    tm_scale_bar(position = c("left", "top"))

plot of chunk tmap_add

Note that the tm_style_ functions change the overall theme of the plot.

Exercise

Color scales in tmap are divided into bins by default.

Consult the help file ?tm_fill to find which argument controls how the scale is divided. Can you change it to a continuous gradient instead of bins?

Note that a single thematic map can combine data from multiple vector or raster objects. You can find more tmap examples in the tmap in a nutshell R vignette.

Top of Section


Interactive maps with Leaflet

The leaflet package is a R interface to the leaflet JavaScript library. It produces interactive maps (with controls to zoom, pan and toggle layers) combining local data with base layers from web mapping services.

The leaflet() function creates an empty leaflet map to which layers can be added using the pipe (%>%) operator. The addTiles functions adds a base tiled map; by default, it imports tiles from OpenStreetMap. We center and zoom the map with setView.

library(leaflet)
imap <- leaflet() %>%
            addTiles() %>%
            setView(lng = -76.505206, lat = 38.9767231, 
                    zoom = 7)
imap

Switch to the “Viewer” tab in RStudio to see the result.

leaflet1

To show how leaflet can access open data from various web mapping services (WMS), we add real-time weather radar data from the Iowa Environmental Mesonet.

imap %>%
    addWMSTiles(
        "http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi",
        layers = "nexrad-n0r-900913", group = "base_reflect",
        options = WMSTileOptions(format = "image/png", transparent = TRUE),
        attribution = "Weather data © 2012 IEM Nexrad"
    )

Use the map controls to zoom away from the current location and find ongoing storm events.

Top of Section


Leaflet maps in Shiny

Since leaflet outputs a JavaScript object, it integrates particularly well with Shiny applications. Leaflet maps can be embedded in Shiny with the leafletOutput (UI-side) and renderLeaflet (server-side) functions.

Open the lesson-7-2.R handout. The first part of the code loads libraries, then imports the counties_md shapefile and nlcd_proj raster. The latter has been pre-projected to Web Mercator for compatibility with leaflet.

While leaflet performs some projection conversions automatically, fixing raster projections in advance can greatly reduce the app loading time.

# Libraries
library(rgdal)
library(raster)
library(leaflet)

# Load data
counties_md <- readOGR("_slides/data/cb_500k_maryland", "cb_500k_maryland")
nlcd_proj <- raster("_slides/data/nlcd_proj.grd")

Defining the UI

For the sake of this demonstration, our user interface will be composed of a single leafletOutput object inside a fluidPage (a flexible Shiny layout).

ui <- fluidPage(title = "Interactive Map",
    leafletOutput("sesync_map")
)

Defining the server

In the server function, we first specify that the output will be generated by a renderLeaflet.

server <- function(input, output) {
    output[["sesync_map"]] <- renderLeaflet({

The following setView and addTiles command produce the same base map as earlier. We then add a point marker at the location of SESYNc, with a popup description

        addMarkers(lng = -76.505206, lat = 38.9767231, 
                   popup = "SESYNC") %>%

Adding polygon and raster layers

We then add the county outlines with addPolygons and the land cover information with addRasterImage.

        addPolygons(data = counties_md, fill = FALSE)  %>%
        addRasterImage(nlcd_proj, opacity = 0.7, 
                       project = FALSE)

Here is the complete server function. You are now ready to run the app.

server <- function(input, output) {
    output[["sesync_map"]] <- renderLeaflet({
        leaflet() %>% 
            setView(lng = -76.505206, lat = 38.9767231, 
                    zoom = 8) %>%
            addTiles() %>%
            addMarkers(lng = -76.505206, lat = 38.9767231, 
                       popup = "SESYNC") %>%
            addPolygons(data = counties_md, fill = FALSE)  %>%
            addRasterImage(nlcd_proj, opacity = 0.7, 
                           project = FALSE)
    })    
}

Map layer controls

We complete this lesson by adding some interactive control to toggle specific layers on the map.

Open the lesson-7-3.R handout. It is identical to the previous app code, except for a few additional lines in the server function.

First, notice that we assigned the polygons layer to a group named “MD counties”.

        addPolygons(data = counties_md, fill = FALSE, 
                    group = "MD counties") %>%

We add two separate raster layers, each produced by masking all but a single land cover class. We put each layer in a group named for its land cover class and assign it a color.

        addRasterImage(mask(nlcd_proj, nlcd_proj == 41, 
                            maskvalue = FALSE),
                       group = "Deciduous Forest", 
                       colors = "green", project = FALSE) %>%
        addRasterImage(mask(nlcd_proj, nlcd_proj == 81, 
                            maskvalue = FALSE),
                       group = "Pasture", 
                       colors = "yellow", project = FALSE) %>%

We then call addLayersControl to add controls that toggle between the two rasters (baseGroups) with the option of overlaying the county polygons (overlayGroups).

        addLayersControl(baseGroups = c("Deciduous Forest", 
                                        "Pasture"),
                         overlayGroups = c("MD counties"))

You can now run the app and try the layer controls.

leaflet_shiny

In addition to these controls embedded in the map, it is possible to modify a map based on other input objects in the Shiny app. See the SuperZip app in the Shiny gallery for a full example.

Top of Section