In this post I’ll discuss creating and altering shapefiles, and converting point sets from one coordinate reference system to another. I’ll also touch on scripting these tasks for large data sets. I’ll begin with the installation of Quantum GIS and Python for manipulating geographical data. I mainly use QGIS for visualizing and building shapefiles, and I use OSGeo4W from the command line for adding/converting shapefile projections, and converting point sets from one CRS to another.
Coordinate Reference Sytems
First off, I’ll explain a bit about coordinate reference systems, because they’re sort of domain-specific. Latitude and longitude alone do not uniquely identify a location, one must also provide a coordinate reference system (CRS). There are many different ways of projecting the curved surface of the Earth onto a flat map, and many different geodetic (think Epcot) models of the Earth’s curvature. This gives rise to two types of coordinate reference systems, projections and geodetic datums. Most land surveys in the United States use the 1927 North American Datum but others, such as NAD83, are not uncommon. Google uses the WGS84 datum, but older GPS devices usually use an earlier system, which should be specified in the owner’s manual (?) or somewhere.
Installing QGIS
I am using a 64-bit installation of Windows 7. I downloaded Quantum GIS (QGIS) as a free, open source alternative to ArcGIS. That installation process, using the default options, also installs OSGeo4W and GRASS, among other things. OSGeo4W provides a very powerful set of tools that may be utilized via the command line, or leveraged in a script.
When changing the CRS of set of points, you might get an error saying that you need to set the GDAL_DATA
environment variable. You should set this environment variable to the directory in your QGIS installation with files like compdsc.csv
or data_shift.csv
in it. For my installation, I set GDAL_DATA
to C:\Program Files (x86)\Quantum GIS Lisboa\share\gdal
. This may or may not work for you.
Installing Python (optional)
N.B. ArcGIS uses Python, and it installs its own version of Python somewhere on your system. This allows ArcGIS users to have Python, and avoid burdening them with installing Python themselves. Unfortunately, if you already have Python, and ArcGIS installation might mess up your pre-existing Python installation. To the best of my knowledge, the most direct solution to this problem is to re-install Python.
Python is an easy to learn, general purpose programming language. If you are running Windows, you can download a Microsoft Installer file here. There is an osgeo
module for Python, but I have found bugs in it. Specifically, it will not transform points from one CRS to another, it will just give you back the same points as before.
I also recommend installing the SciPy Stack, which is a family of related scientific modules for Python. Included in the stack is the IPython Notebook which provides users with a “notebook” interface, like Mathematica. The notebook interface works from within a browser (prefereably Chrome) and gives you cells for code that may be executed individually, this lets work quickly and interactively. The notebook interface also lets you insert paragraphs, images, and interactive plots between these code cells so that you can have documentation for yourself or your colleagues. Sharing a notebook is as easy as sending your colleague an .ipynb
file.
You will need to add C:\Python27;C:\Python27\Scripts;
to your Path
environment variable, and C:\Python27;C:\Python27\Lib\site-packages;C:\Python27\Lib;C:\Python27\DLLs;C:\Python27\Lib\lib-tk;
to your PYTHONPATH
environment variable, which you will have to create if you do not already have it.
At that point, you may start the IPython Notebook by entering at the command line,
ipython notebook --script --pylab=inline
The --script
flag writes your work to a .py
script file, and the --pylab=inline
loads a plotting and visualization module, and ensures that images and plots are displayed in your browser window, and not a separate pop-up window, which is useful when you have a lot of plots.
Notes on command line tools
The command line or terminal lets you call programs. Working “interactively” means that you type commands into the terminal and view the results. If you have lots of input and/or output, you’d probably rather read input from a file and/or write your output to a file. You may specify files for input using the input redirection operator, <
, and write the output to a file using the output redirection operator, >
. (Examples of this are shown below.)
Command line tools respect file name paths. If an input or output file does not have a path prepended to it, then the command line will assume that that file lives in the current working directory, the directory you are “in”. You can see your working directory by typing pwd
at the terminal and hitting Enter.
When output is written to a file, the command line will create that file if it does not exist, (typically) overwrite that file, or (atypically) append to the end of that file. You should be careful that you are not accidentally overwriting important files.
Coding Examples
Transfrom points from one CRS to another
First, we’ll type osgeo4w' to start OSGeo. When it gets done loading, we'll call the
gdaltransform` function,
osgeo4w gdaltransform -s_srs EPSG:4267 -t_srs EPSG:4326 < input.csv > output.txt
-s_srs
and -t_srs
stand for the source and target spatial reference systems, respectively. EPSG:4267
and EPSG:4326
represent the NAD27 and WGS84 coordinate reference systems. input.csv
is a list of comma separated latitude and longitude coordinates, one point per line, while output.txt
is a text file with coordinates separated by spaces, again, one point per line. If output.txt
does not exist before this command is called, then it will be created. If output.txt
does exist, then it will be overwritten, so be careful. If the filenames for input.csv
and output.csv
do not contain paths, then gdaltransform
will look in the current directory for input.txt
and then put output.csv
in the current directory. If paths are supplied, then gdaltransform
will respect those paths.
Change the projection of an ESRI Shapefile
Here, dst.shp
and src.shp
are the destination and source files, respectively. (Caveat: the destination file comes before the source file.) If you want to change the spatial reference system of a given Shapefile, then dst.shp
and src.shp
will have the same name. This will overwrite the file in-place. If dst.shp
and src.shp
are different files, then dst.shp
will be created if it does not exist, or it will overwrite soemthing(!).
osgeo4w ogr2ogr -s_srs EPSG:4267 -t_srs EPSG:4326 -f "ESRI Shapefile" dst.shp src.shp
Convert a GeoJSON file to an ESRI Shapefile
The neat thing about GeoJSON files is that you you can attach any number of attributes to each same point. Structurally, GeoJSON files resemble simple JSON files, which makes them easy to build and parse. As you gather more information you can store it in a GeoJSON, and then view your spatial data in QGIS and (sort-of) query the attributes as you would a database. From there, you can make all sorts of pretty color-coded maps in order to mislead people.
osgeo4w ogr2ogr -f "ESRI Shapefile" dst.shp src.json
Call OSGeo4w from Python
There is an osgeo module for Python, but I found that it has several bugs. Specifically, it will not transform points from one CRS to another, it will simply give you back your input unchanged. In light of this unhappy development, I used system calls to script OSGeo, described below.
The os.system()
function lets you pass commands, as a (quoted) string, to the command line. In my installation, I found that I had to first call osgeo4w
in order to use any of its functions. I also had to add C:\Program Files (x86)\Quantum GIS Lisboa;C:\Program Files (x86)\Quantum GIS Lisboa\bin;
to my Path environment variables.
import os os.system( 'osgeo4w;; gdaltransform -s_srs EPSG:4267 -t_srs EPSG:4326 < input.csv > output.txt' )