The Landscape of Spatial Data Tools

Instructor: Kelly Hondula

GIS software is designed to capture, manage, analyze, and display (on a map!) all forms of geographically referenced information. Before diving deep into software, we review basic concepts for geographical data and the associated open source tools.

Top of Section


Geographical Data

Breaking down these uses of GIS, we find the usual components of any data driven project. The strong emphais on geographical marks the special attention that spatial attributes of data require.

capture record data with sufficient context to be useful
manage work with vast amounts of data, often captured in different contexts
analyze rely on the inherent spatial relationships between data
display conveniently display data in its native form (on a map!)

So What?


Source


Source

Why Open Source

The excellent and canonical alternative is ESRI’s ArcGIS platform.

Top of Section


Objectives for this lesson

Specific achievements

Top of Section


Managing Geographical Data

Raster Vector
matrix table
surface shape
pixels features
bands attributes

Raster File Formats

Some raster formats are generic image formats, and can be opened with a system image viewer or inserted into documents as images.

Vector File Formats

Vector formats, even if the information is stored as plain text, need software to lay out the features as an image.

Top of Section


Geospatial Data Abstraction Library (GDAL)

GDAL interprets between a vast collection of storage formats and applications that read or write spatial data. It has a single abstract data model for rasters, and another for vector data. Because GDAL knows all about different formats, it includes many tools for translating and processing geographical data.

gdalinfo --formats
Supported Formats:
  VRT (rw+v): Virtual Raster
  GTiff (rw+vs): GeoTIFF
  NITF (rw+vs): National Imagery Transmission Format
  ...

For historical reasons gdal* commands work with rasters, and ogr* commands work with vectors.

ogrinfo --formats

Supported Formats:
  -> "ESRI Shapefile" (read/write)
  -> "MapInfo File" (read/write)
  -> "UK .NTF" (readonly)
  ...

Looking at Metadata (Rasters)

cd %sandbox%
cd data
gdalinfo natural-earth.tif
Driver: GTiff/GeoTIFF
Files: natural-earth.tif
Size is 8100, 4050
Coordinate System is:
GEOGCS["WGS 84",
  DATUM["WGS_1984",
    SPHEROID["WGS 84",6378137,298.257223563,
      AUTHORITY["EPSG","7030"]],
    AUTHORITY["EPSG","6326"]],
  PRIMEM["Greenwich",0],
  UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.044444444444440,-0.044444444444440)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_DATETIME=2014:10:18 11:51:41
  TIFFTAG_RESOLUTIONUNIT=2 (pixels/inch)
  TIFFTAG_SOFTWARE=Adobe Photoshop CC 2014 (Macintosh)
  TIFFTAG_XRESOLUTION=72
  TIFFTAG_YRESOLUTION=72
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
Upper Right ( 180.0000000,  90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
Center      (  -0.0000000,   0.0000000) (  0d 0' 0.00"W,  0d 0' 0.00"N)
Band 1 Block=256x256 Type=Byte, ColorInterp=Red
Band 2 Block=256x256 Type=Byte, ColorInterp=Green
Band 3 Block=256x256 Type=Byte, ColorInterp=Blue															

Looking at Metadata (Vectors)

ogrinfo -ro -so ne_10m_urban_areas
INFO: Open of `ne_10m_urban_areas'
  using driver `ESRI Shapefile' successful.
1: ne_10m_urban_areas (Polygon)
ogrinfo -ro -so ne_10m_urban_areas ne_10m_urban_areas
INFO: Open of `ne_10m_urban_areas'
using driver `ESRI Shapefile' successful.
	  
Layer name: ne_10m_urban_areas
Geometry: Polygon
Feature Count: 11878
Extent: (-157.991183, -51.053037) - (178.033834, 69.769855)
Layer SRS WKT:
GEOGCS["GCS_WGS_1984",
  DATUM["WGS_1984",
    SPHEROID["WGS_84",6378137.0,298.257223563]],
  PRIMEM["Greenwich",0.0],
  UNIT["Degree",0.017453292519943295]]
scalerank: Integer (10.0)
featurecla: String (50.0)
area_sqkm: Real (13.3)

Translation

In addition to reading spatial data in all those formats, GDAL provides capabilities for writing data in new formats. Translations can also occur between the same format with a step that modifies the data or metadata.

Write metadata about a raster object that is missing from the world.png raster.

gdal_translate -a_srs WGS84 -a_ullr -180 90 180 -90 world.png geoworld.tif
Question
Compare the output of gdalinfo for the original world.png and new geoworld.tif raster files. What did the “corner coordinates” property of the PNG file correspond to before translation? And after translation?

The spatial data processing tools provided by GDAL are executed by giving “arguments” to the gdal_translate, which take the form -flag param1 param2 ..., which is how most command line utilities work.

The flag -outsize takes two parameters for the horizontal and vertical output size (optionally as percentages of the input size) in pixels.

gdal_translate -outsize 10% 10% geoworld.tif geoworld-small.tif

The flag -projwin takes four parameters that specify the new corners of the output file, giving a subwindow of the input file.

gdal_translate -projwin -180 90 0 0 geoworld.tif geonw.tif
Exercise
Choose coordinates that bound your home state or country, and create a new file called natural-earth-home.tif that only includes that area of the natural-earth.tif. The order of parameters is upper left “x”, upper left “y”, lower right “x”, lower right “y”.
Question
Why bother with this approach over cropping the image with an image editor?

Conversion

Here’s a handy little utility. Ever want to quickly peek at a shapefile? Why not convert it to a raster at the command line?

gdal_rasterize -ot Byte -burn 255 -tr 0.01 0.01 -l ne_10m_urban_areas ne_10m_urban_areas urban-areas.tif
Exercise
Examin the metadata of urban-areas.tif using the -stats flag. This provides additional information, but does it look right? gdal_rasterize is very literal, it set polygons to 255 and everything else to 0, whereas areas outside the polygons might be more appropriately set to “Null”. Read the docs to decide how it can be done. Try to re-create the raster so that the mean value is 255.

Top of Section


Geometry Engine Open Source (GEOS)

GEOS works on vectors, also known as geometric objects, and adds advanced GIS functionalities (topological operations) to complement the essential reading, writing and translating utilities in GDAL.

Like GDAL, GEOS is primarily intended to be a library used by other software. Unlike GDAL, GEOS does not provide its own collection of command line utilities (like gdalinfo).

One piece of software using GEOS is PostgreSQL with the PostGIS extension.

Log back into http://pgstudio.research.sesync.org

In the SQL window we can execute PostGIS functions that rely on the GEOS libary. A simple example is the function ST_Length, where ST indicates the function accepts spatial types as arguments.

SELECT ST_Length(ST_GeomFromText('LINESTRING(0 0, 1 1)'));
    st_length
-----------------
1.4142135623731
(1 row)

A very common vector operation in spatial analysis is to create a buffer around a spatial object.

SELECT ST_AsText(ST_Buffer(ST_GeomFromText('POINT(0 0)'), 1, 1));
    st_astext_
----------------
POLYGON((1 0, 0 -1, -1 0, 0 1, 1 0))
Exercise
Calculate an approximation to Pi, the ratio of a circle’s circumference to its diamter, using GEOS functionality built into PostGIS. The function ST_Perimeter gives you the length of the perimiter of a polygon.

Naturally, this is all more interesting with actual data.

SELECT id, ST_AsText(ST_Transform(restaurants.geom, 4326))
FROM ch01.restaurants
JOIN ch01.highways
ON ST_Intersects(
    ST_Buffer(restaurants.geom, 100), highways.geom)
WHERE feature like '%Toll Road';
 id   |        st_astext
------+-------------------------
22310 | POINT(-75.2094 40.6943)
25011 | POINT(-121.499 45.7115)
(2 rows)
Question
Has anybody actually got an answer? Thirty simultaneous spatial queries might be a bit much for our toy server.

Top of Section


Geographic Coordinate Systems (GCS)

A geographic coordinate system (GCS) is a reference system that makes it possible to specify locations on the surface of the earth by a latitude and longitude.

An angular unit of measure
The angular unit of measure is usually a degree, or one 360th of a circle, and fractional degrees are usually specified as decimals.
A prime meridian
The location of 0° longitude, e.g. the Royal Observatory, Greenwich, UK
A datum
An apprimation to the shape of the earth’s surface and where (in the GCS being defined) certain fixed positions are located. A spheroid (a flattish sphere) is used to approximate this shape, and the reference positions anchor its surface to known points.

Spheroids and Geoids

Spheroids and Geoids

Datum

Every geographic coordinate system begins with a precisely surveyed starting point.

Question:
Examine the output of gdalinfo for natural-earth.tif again (see reminder below). What is the name of Geographic Coordinate System (GEOGCS)? What about the name of the spheroid? Are these different?
gdalinfo natural-earth.tif

Top of Section


Projected Coordinate Systems (PCS)

A projected coordinate system is defined by four components:

Question
What GCS is used in westernfires_vir_2015231_geo.tif? What is the map projection?

Why use a projected coordinate system?

Map Projections

Converts the earth’s three-dimensional surface to a map’s two-dimensional surface. This mathematical transformation is commonly referred to as a map projection.

Using GDALs ability to re-project raster images with gdalwarp, let’s compare two rasters.

gdalsrsinfo -o wkt urban-areas.tif > urban-area.prj
gdalwarp -t_srs urban-areas.prj westernfires_vir_2015231_geo.tif westernfires_vir_2015231_geo-prj.tif
gdalinfo westernfires_vir_2015231_geo-prj.tif
gdal_translate -projwin -139.2611826 57.8698551 -80.7311826 22.2398551 urban-areas.tif urban-areas-west.tif

Top of Section


Distortion

Projected coordinate systems – they distort the world.

Four spatial properties subject to distortion in PCS:

The most recognizable distortion is the result of treating latitude and longitude values as if they were the “x” and “y” values of a projected coordinate system.

Choose Projections Wisely!

Name Feature that is Preserved
Conformal Shape
Equal Area Area
Equidistant Distance between one or two foci and every other point
Azimuthal Direction from one of two points to every other point

Projections that Preserve Shape

Conformal projection: all local angles measured from a point are correct and all local shapes are true.

Use conformal projection when the map’s main purpose involves measuring angles, showing accurate local directions, or representing the shapes of features or contour lines. This category includes:

Projections that Preserve Area

Equal-area projection: the size of any area on the map is in true proportion to its size on the earth. You should use equal-area projections to show:

How does this impact your spatial analysis?

Unless explicitly calculating geometric attributes on a curved surface, the projection used will affect the result of any such calculations. For a “planar” calculation of the area of the USA, for example, different projections give different results.

Top of Section


Summary

The open source stack has two essential libraries at the bottom: GDAL of reading, writing and translating raster and vector data, and GEOS for geometric calculations on vector data.

We will move up the stack to software interacting with these libraries in the next lesson, but the command line utilities offered by GDAL are available for batch processing of spatial data using a shell script.

Projections distort the world - be aware of the spatial reference system your spatial data employ. If possible, find spatial data in an appropriate PCS for your analysis or convert unprojected data with a known GCS to an appropriate PCS.

Top of Section