Using QGIS and OSGeo4W for Geo-Data Tasks

In this post I’ll discuss creating and altering shapefiles, and converting point sets from one coordinate reference system to another. I’ll also touch on scripting these tasks for large data sets. I’ll begin with the installation of Quantum GIS and Python for manipulating geographical data. I mainly use QGIS for visualizing and building shapefiles, and I use OSGeo4W from the command line for adding/converting shapefile projections, and converting point sets from one CRS to another.

Coordinate Reference Sytems

First off, I’ll explain a bit about coordinate reference systems, because they’re sort of domain-specific. Latitude and longitude alone do not uniquely identify a location, one must also provide a coordinate reference system (CRS). There are many different ways of projecting the curved surface of the Earth onto a flat map, and many different geodetic (think Epcot) models of the Earth’s curvature. This gives rise to two types of coordinate reference systems, projections and geodetic datums. Most land surveys in the United States use the 1927 North American Datum but others, such as NAD83, are not uncommon. Google uses the WGS84 datum, but older GPS devices usually use an earlier system, which should be specified in the owner’s manual (?) or somewhere.

Installing QGIS

I am using a 64-bit installation of Windows 7. I downloaded Quantum GIS (QGIS) as a free, open source alternative to ArcGIS. That installation process, using the default options, also installs OSGeo4W and GRASS, among other things. OSGeo4W provides a very powerful set of tools that may be utilized via the command line, or leveraged in a script.

When changing the CRS of set of points, you might get an error saying that you need to set the GDAL_DATA environment variable. You should set this environment variable to the directory in your QGIS installation with files like compdsc.csv or data_shift.csv in it. For my installation, I set GDAL_DATA to C:\Program Files (x86)\Quantum GIS Lisboa\share\gdal. This may or may not work for you.

Installing Python (optional)

N.B. ArcGIS uses Python, and it installs its own version of Python somewhere on your system. This allows ArcGIS users to have Python, and avoid burdening them with installing Python themselves. Unfortunately, if you already have Python, and ArcGIS installation might mess up your pre-existing Python installation. To the best of my knowledge, the most direct solution to this problem is to re-install Python.

Python is an easy to learn, general purpose programming language. If you are running Windows, you can download a Microsoft Installer file here. There is an osgeo module for Python, but I have found bugs in it. Specifically, it will not transform points from one CRS to another, it will just give you back the same points as before.

I also recommend installing the SciPy Stack, which is a family of related scientific modules for Python. Included in the stack is the IPython Notebook which provides users with a “notebook” interface, like Mathematica. The notebook interface works from within a browser (prefereably Chrome) and gives you cells for code that may be executed individually, this lets work quickly and interactively. The notebook interface also lets you insert paragraphs, images, and interactive plots between these code cells so that you can have documentation for yourself or your colleagues. Sharing a notebook is as easy as sending your colleague an .ipynb file.

You will need to add C:\Python27;C:\Python27\Scripts; to your Path environment variable, and C:\Python27;C:\Python27\Lib\site-packages;C:\Python27\Lib;C:\Python27\DLLs;C:\Python27\Lib\lib-tk; to your PYTHONPATH environment variable, which you will have to create if you do not already have it.

At that point, you may start the IPython Notebook by entering at the command line,

ipython notebook --script --pylab=inline

The --script flag writes your work to a .py script file, and the --pylab=inline loads a plotting and visualization module, and ensures that images and plots are displayed in your browser window, and not a separate pop-up window, which is useful when you have a lot of plots.

Notes on command line tools

The command line or terminal lets you call programs. Working “interactively” means that you type commands into the terminal and view the results. If you have lots of input and/or output, you’d probably rather read input from a file and/or write your output to a file. You may specify files for input using the input redirection operator, <, and write the output to a file using the output redirection operator, >. (Examples of this are shown below.)

Command line tools respect file name paths. If an input or output file does not have a path prepended to it, then the command line will assume that that file lives in the current working directory, the directory you are “in”. You can see your working directory by typing pwd at the terminal and hitting Enter.

When output is written to a file, the command line will create that file if it does not exist, (typically) overwrite that file, or (atypically) append to the end of that file. You should be careful that you are not accidentally overwriting important files.

Coding Examples

Transfrom points from one CRS to another

First, we’ll type osgeo4w' to start OSGeo. When it gets done loading, we'll call thegdaltransform` function,

gdaltransform -s_srs EPSG:4267 -t_srs EPSG:4326 < input.csv > output.txt

-s_srs and -t_srs stand for the source and target spatial reference systems, respectively. EPSG:4267 and EPSG:4326 represent the NAD27 and WGS84 coordinate reference systems. input.csv is a list of comma separated latitude and longitude coordinates, one point per line, while output.txt is a text file with coordinates separated by spaces, again, one point per line. If output.txt does not exist before this command is called, then it will be created. If output.txt does exist, then it will be overwritten, so be careful. If the filenames for input.csv and output.csv do not contain paths, then gdaltransform will look in the current directory for input.txt and then put output.csv in the current directory. If paths are supplied, then gdaltransform will respect those paths.

Change the projection of an ESRI Shapefile

Here, dst.shp and src.shp are the destination and source files, respectively. (Caveat: the destination file comes before the source file.) If you want to change the spatial reference system of a given Shapefile, then dst.shp and src.shp will have the same name. This will overwrite the file in-place. If dst.shp and src.shp are different files, then dst.shp will be created if it does not exist, or it will overwrite soemthing(!).

ogr2ogr -s_srs EPSG:4267 -t_srs EPSG:4326 -f "ESRI Shapefile" dst.shp src.shp

Convert a GeoJSON file to an ESRI Shapefile

The neat thing about GeoJSON files is that you you can attach any number of attributes to each same point. Structurally, GeoJSON files resemble simple JSON files, which makes them easy to build and parse. As you gather more information you can store it in a GeoJSON, and then view your spatial data in QGIS and (sort-of) query the attributes as you would a database. From there, you can make all sorts of pretty color-coded maps in order to mislead people.

ogr2ogr -f "ESRI Shapefile" dst.shp src.json

Call OSGeo4w from Python

There is an osgeo module for Python, but I found that it has several bugs. Specifically, it will not transform points from one CRS to another, it will simply give you back your input unchanged. In light of this unhappy development, I used system calls to script OSGeo, described below.

The os.system() function lets you pass commands, as a (quoted) string, to the command line. In my installation, I found that I had to first call osgeo4w in order to use any of its functions. I also had to add C:\Program Files (x86)\Quantum GIS Lisboa;C:\Program Files (x86)\Quantum GIS Lisboa\bin; to my Path environment variables.

import os
os.system( 'osgeo4w;; gdaltransform  -s_srs EPSG:4267 -t_srs EPSG:4326 < input.csv > output.txt' )