Last fall I started a blog post with a simple geoprocessing project in mind. I got a chance to return to the subject and investigate it more in preparation for a presentation at CUGOS. Here’s the blog version.
There’s a number of core libraries that help drive some well known open-source software. Here’s a few in blue:
Some of these libraries are geometry engines that describe basic spatial predicates (unary and binary) as well as the basic spatial analysis methods such as Intersection, Union, Difference. Other libraries are used for interoperability — to read and write to multiple file formats, both raster and vector, among other functions.
The geometry engines in this example ( GEOS, JTS and .NETTopolgySuite(.NTS) ) have a common history. GEOS and .NTS are both ports of the Java Topology Suite (JTS) and all three are compliant with the OGC’s Simple Features Specification for SQL.
Within their respective language domains these libraries also work together under the hood of well-known applications and OSGeo projects we are all familiar with.
There’s always a need for customization or batch geoprocessing in workflows. But not everyone is a C++ or Java programmer. So we need a way to talk to these libraries. Luckily, developers produce bindings in scripting languages like Python, Perl and PHP so others can leverage this functionality. Here’s some of those bindings.
GeoScript is a set of JavaScript, Groovy, Scala and Python wrappers for the JavaTopologySuite (JTS) and GeoTools. Shapley wraps Geometry Engine Open Source (GEOS). GDAL/OGR can be built with Python bindings.
Chances are that the applications have their own libraries and bindings as well.
For example, QGIS has Python bindings. That means that you can write scripts that leverage QGIS functionality, just as you would with ArcMap and the geoprocessor. You can read and write to multiple file formats as well as do spatial analysis. QGIS is built on many libraries and plugins. It therefore might have more functionality to expose through its bindings than any single library wrapper like Shapely — don’t really know since I haven’t started using QGIS bindings yet.
The same is true of other open GIS desktop software such as gvSIG. It has a scripting engine at version 2.0 that can speak Jython, JavaScript and Groovy. And GeoDjango has its own bindings written for GEOS, PROJ4 and GDAL libraries.
The diagram above might be a cluttered and confusing graphic, but the take-home-point for GIS Specialists and those open-source newbies is you have a lot of freaking ways to access this functionality. Maybe it’s even a little overwhelming.
I work in an ESRI shop and that means I don’t get a lot of time to play with open-source tools. So I’m trying to carve out a happy, cozy, fuzzy space for open source at my job. Being a GIS Analyst/Specialist isn’t the most exciting job in the world and I have to find ways to make it more educational. So there’s a couple areas where I can insert open-source GIS.
I have a list. The first list is geoprocessing calls I can’t make without an expensive ArcInfo ArcMap license. Here’ some of those.
That said, there’s usually manual workarounds (in red). Or there might be third party installs you can use for a price. Or there’s spatial database functions that might resolve these needs. Or there’s the ArcObjects Desktop API and SDK to program these solutions in C#.NET and Visual Studio.
The point, there’s a lot of options and open-source geoprocessing is another. The second list (above) describes some things I really don’t like about ArcGIS geoprocessing. I think my biggest conflict is that the cursor methods aren’t pythonic enough. There’s this mx.ODBC Python package that used to be free. You could pass a database a select statement and get a list of tuples in return. I’m using Python and I’d like to use lists of lists instead of getters and setters. Just sayin’.
These lists led me to start playing with GeoScript bindings about 8 or 9 months ago.
So let’s go through some very very basic geoprocessing to understand a little bit more about the difference between the GeoScript, Shapely and OGR Python bindings. This isn’t a tutorial. Nothing is going to be more educational than sitting down with these libraries and trying different geoprocessing moves on your own.
I want to find a very cheap way to do Polyline to Polygon conversion (or in OGC standards terminology LineString to Polygon conversion). When I say cheap I’m not talking about computationally inexpensive. I’m talking cheap like Walmart. Example 1 is the cheapest way I can think of to do the conversion.
So what is Well Known Text (WKT) string replacement? The best thing to do is show a quick example. Let’s use an OpenLayers example to illustrate the conversion. If we go the example page on the OpenLayers web site and filter for WKT we should get one example back.
We can draw shapes on the map and then view their WKT or GeoJson data structures in the sidebox. Or we can cut and paste WKT and GeoJson to see what the outcome is going to be. In the picture below I drew a quick LineString. Naturally, the WKT representation to the right shows this as a LineString with it’s vertices.
To convert this LineString to a Polygon all we have to do is replace the world ‘LineString’ with the word ‘Polygon’ and slap an extra pair of brackets around the existing brackets. Like so.
That’s all we’re going to do in this simple GeoScript script. We’re going to read in a shapefile of our test data. Then we’re going to export each geometry into WKT and do a string.replace() method call. All LineString(s) will get replaced with Polygon plus a set of opening and closing brackets. All MultiLineString(s) will get replaced with just Polygon. Then we’ll create geometries based on those new WKT representations and write them out to a shapefile. Here’s the core function of the script that shows the reading, serializing, deserializing and writing:
def p2pWKT(inshp,outshapeDIR):
shp = Shapefile(inshp)
polySchema = changeSchemaType(os.path.splitext(os.path.split(inshp)[1])[0],geom.Polygon,shp)
ws = Directory(outshapeDIR)
polyLayer = ws.create(schema=polySchema)
for f in shp.features():
try:
dictFeat = dict(f.iteritems())
geomWKT = writeWKT(f.geom)
if geomWKT.count('MULTI') != 0:
polyWKT = geomWKT.replace('MULTILINESTRING ', 'POLYGON ')
else:
polyWKT = geomWKT.replace('LINESTRING', 'POLYGON (')+ ")"
#Replace
dictFeat['the_geom'] = polyWKT
newFeat = polyLayer.schema.feature(dictFeat,f.id)
#Report da shapez
print ": "+writeWKT(f.geom)
print "\n: "+ polyWKT
print "\n: "+writeWKT(newFeat.geom)
polyLayer.add(newFeat)
except TypeError, err:
print "\n",err
continue
Let’s take a look at our test data.
The LineStrings on the top are shapes that once they become Polygons I’m guessing will be pretty compatible with the OGC Simple Features Standard. The shapes on the bottom row are sketchy by the same standard. Let’s run the script to see how GeoScript handles them.

There’s a couple points I want to make here about the output. All the shapes did exactly what I expected them to. GeoScript managed to do the substitution on the top shapes and balked at the bottom ones. So that means that If we go look at the bash shell we’ll see the underlying library screaming at us that it can’t do what we already didn’t expect it to do (if you look closely at the last line of the bash shell picture below you’ll see the geometry couldn’t be coerced into com.vividsolutions.jts.geom.Geometry).
What’s wrong with the bottom shapes? A simplified version of the OGR Simple Feature Spec validation rules for polygons might go like this 1) Polygon cannot cross itself, 2) All inner rings cannot touch or overlap, 3) Inner rings can only touch the exterior shell at one point. There’s probably more nuanced geometrical assertions too, but that’s all we’ll need to talk about here.
Now let’s move to example 2 and do the exact same workflow but using OGR bindings. Why? Well, there’s a chance that the libraries make different assumptions about reading and writing geometries. Here’s the main function to do this work:
def polyline2Polygon(inShapeDIR, outShapeDIR):
ogr.UseExceptions()
try:
#GET POLYLINE SHAPFILE
inshp = ogr.Open(inShapeDIR)
inlyr = inshp.GetLayer()
#CREATE POLYGON SHAPEFILE
drv = ogr.GetDriverByName( "ESRI Shapefile" )
outshp = drv.CreateDataSource( outShapeDIR )
outlyr = outshp.CreateLayer(os.path.splitext(os.path.split(outShapeDIR)[1])[0], None, ogr.wkbPolygon )
#FOR EACH POLYLINE FEATURE CREATE A POLYGON
for n in range( 0 , inlyr.GetFeatureCount() ):
infeat = inlyr.GetFeature(n)
export = infeat.geometry().ExportToWkt()
if export.count('MULTI') != 0:
outgeoT = export.replace("MULTILINESTRING" , "POLYGON " )
else:
outgeoT = export.replace("LINESTRING", "POLYGON (") + ")"
outgeo = ogr.CreateGeometryFromWkt(outgeoT)
#REPORT DA SHAPEZ
print ": "+export
print "\n: "+outgeoT
print "\n: "+ str(outgeo)
featOut = ogr.Feature(outlyr.GetLayerDefn())
featOut.SetGeometry(outgeo)
outlyr.CreateFeature(featOut)
featOut.Destroy()
except:
print "polyline2Polygon() FAIL boo...."
finally:
drv = None
gdal.ErrorReset()

Whoa. That’s different than what I expected. OGR was able to create output for each of the geometries, even the screwball ones. Why would this happen? Let’s talk about a few things of interest. Look at what OGR did with the bottom left-most shape and the one to the right of it . It added an extra vertex and closed a LineString that was topologically unclosed in both cases. That’s exactly what OpenLayers does too! Not what I expected, or maybe even wanted, but it’s cool.
The second thing of interest is those funky Polygon shapes on the bottom row. How do they exist?
Above I said something about assertion rules for geometry in the OGC Simple Features Spec. How could we test if these geometries are valid by OGC Simple Features Spec standards? One method would be to use a tool in QGIS. Or there’s unary predicates in OGR, Shapely and GeoScript for each geometry that ask, “Am I valid?” and return a boolean.
The QGIS tool will list what’s wrong with each geometry (not sure if the validation rules toe the OGC Simple Feature Spec) like below. QGIS also used to prohibit editing geometries that were funky. In the newest version that has changed and even funky shapes are editable.
Of course the red shape is invalid for the reasons the tool points out.
The purple shape is an interesting one. The validation tool says it’s valid. In previous versions of QGIS this shape would be invalid. So it looks like QGIS might be interpreting this as a Polygon with a single touching node between the parts. Not sure.
I’m also uncertain why QGIS thinks this blue polygon has an unnested inner ring. There’s only one shared vertex and this should be allowed according to the Simple Features Spec.
I have two guesses why OGR is able to write these weird geometries out. My first guess is that GDAL/OGR was around before the Simple Features Specification. So while they offer simple feature access, they don’t necessary follow OGC standards strictly and will read and write a spectrum geometries.
Another guess I have is that the Simple Feature Spec mentions that each implementing library can handle validation checks and exceptions in their own desired way. Some, like OGR, might topologically close Polygons that have a border gap. Others, like JTS, might decide that an unclosed Polygon is a complete violation and throw an exception. Not entirely sure.
The take home point is that different libraries might read and write geometries, especially weird ones, differently. So pay attention.
Let’s see what would happen if we read the super funky red geometry back in with OGR.
There it is. OGR reads the geometry it created but throws a warning (so not a complete exception like GeoScript and JTS). The warning says, if you can see it, that there could be numerous problems with this feature including it might be a non-Polygon geometry. Exactly OGR, exactly.
The third example we are going to show deals with the Python bindings Shapely that wrap GEOS.
Shapely is a little bit different than OGR and GeoScript. One big difference is that you can’t read and write to file formats with it. The idea, at least the idea that I’ve taken from it’s website, is that Shapely is a “swiss-army-knife” spatial analysis tool — it does some things really really fast and pythonic but it isn’t going to fix a flat tire. Or something like that.
The coolo-neato things about Shapely include the ability to serialize/deserialize to/from data formats that implement a GeoJson-like interface and to/from numpy arrays. There’s also a couple helpful methods — linemerge and polygonize — that it wraps from the GEOS library.
These methods aren’t as easy to reach in GeoScript or OGR built with GEOS support off the proverbial ‘shelf’ because you’d have to call Java libraries in the case of GeoScript and C++ libraries using Python ctypes. Shapely makes using these methods easy, even though you could develop your own wrappers to get at them in GeoScript and OGR with GEOS support.
I want to say one last thing about Shapely, though it might reflect some naivete on my part about the functionality of this tool. It seems to me that you can do about 80% of what Shapely does if you build OGR against GEOS and use those bindings. Although it might be more pythonic, optimized in certain ways and syntactically simpler, I’m still unsure who the users are and why. I think the TileStache project uses this library on the backend. I’m interested to find more examples.
Since Shapely can’t read or write geometries we have to lean on an existing interop library. Example 3 will use OGR for reads and writes.
The idea behind the third example is simple. I want to do LineString to Polygon conversion again, but differently. There’s probably tons of ways to do this. The best option might be to use JTS’s QuadEdge data type or Topology. But I want to use Shapely.
I knew JTS had a Polygonizer and PolygonBuilder class. Since GEOS is a port of JTS, then it should naturally have similar classes. One day I was looking at Shapely’s API and I came across this method shapely.ops.polygonize(). Let’s see what that does on our test data. Here’s the main functions:
def ogrWkt2Shapely(inShpDIR):
shapelyObjects=[]
shp = ogr.Open(inShpDIR)
lyr = shp.GetLayer()
for n in range(0, lyr.GetFeatureCount()):
feat = lyr.GetFeature(n)
sFeat = loads(feat.geometry().ExportToWkt())
shapelyObjects.append(sFeat)
return shapelyObjects
def writeShp2Ogr(shp, outshpPATH, ogrType=ogr.wkbPolygon):
drv = ogr.GetDriverByName( "ESRI Shapefile")
ds = drv.CreateDataSource(os.path.split(outshpPATH)[0])
lyr = ds.CreateLayer(os.path.splitext(os.path.split(outshpPATH)[1])[0], None, ogrType)
newgeom = ogr.CreateGeometryFromWkt(shp.wkt)
feat = ogr.Feature(lyr.GetLayerDefn())
feat.SetGeometry(newgeom)
lyr.CreateFeature(feat)
feat.Destroy()
def p2pPolygonize(shpPATH, outshpPATH):
shps = ogrWkt2Shapely(shpPATH)
try:
polyList = list(polygonize(shps[0]))
counter = 0
for shp in polyList:
writeShp2Ogr(shp, pthOutput + "TEST"+str(counter)+"_"+os.path.splitext(os.path.split(outshpPATH)[1])[0]+".shp")
counter+=1
except Exception, err:
print "\n",err
return
def runTests():
shapes = ["/WORKSPACE/p2p/DATA/tests/2_MLS.shp", "/WORKSPACE/p2p/DATA/tests/3_MLS.shp"]
shapes.sort()
deleteOutput()
for shp in shapes:
print pthOutput + " --> " + os.path.split(shp)[1]
p2pPolygonize(shp , pthOutput + os.path.split(shp)[1])
print "\n\n\n"

shapely.ops.polygonize() can take a list of LineStrings and return Polygons. But I handed polygonize() a single Polygon geometry and it returned me three shapes — two inner ring polygons and the outer shell. A better explanation of what’s it’s doing can be found in the GEOS API. Regardless of why, this is pretty cool.
So how can we use polygonize() to our benefit? Well, what would happen if I passed the method a MultiPolygon? That’s the question behind example 4.
We want a cheap split function that can split any arbitrary Polygon. We’ll talk about the limitations of what we’re about to do in a second. But imagine that we want to split this polygon representing Mercer Island, Washington with the LineString on top of it.
One way to do this is to intersect our LineStrings with our Polygon. This returns us just the portion of the lines that overlap with Mercer Island. Then we buffer these lines 0.000001 of a meter (yes, it’s possible). Then we’ll do a difference (opposite of intersect) on our buffered lines (a polygon feature now) and Mercer Island polygon. This produces a MultiPolygon feature that looks like this:
The highlighted MultiPolygon feature is actually split even though it’s hard to see. Remember, that’s one record in the attribute table. This isn’t really a good outcome since I’d probably only want to split polygons with LineStrings to divide up an area into discrete parts and work with each part separately. Haha, let’s use polygonize() and see if it will return a collection of single polygons from our MultiPolygon geometry.
import osgeo.ogr as ogr
import osgeo.gdal as gdal
import shapely
from shapely.wkt import loads
import os, string, glob, sys
from shapely.ops import linemerge
from shapely.ops import polygonize
from shapely.geometry import asShape, asLineString, asMultiLineString
from shapely.geometry import LineString, Polygon, MultiPolygon,Point
pthSplitLines = "/WORKSPACE/p2p/DATA/topo/split_polylines.shp"
pthMercer = "/WORKSPACE/p2p/DATA/topo/mercer_island.shp"
pthOutput = "/WORKSPACE/p2p/DATA/output/topo/"
pthTopo = "/WORKSPACE/p2p/DATA/topo/topo020.shp"
def readOgr2Shply(inShpDIR):
shapelyObjects=[]
shp = ogr.Open(inShpDIR)
lyr = shp.GetLayer()
for n in range(0, lyr.GetFeatureCount()):
feat = lyr.GetFeature(n)
sFeat = loads(feat.geometry().ExportToWkt())
shapelyObjects.append(sFeat)
return shapelyObjects
def writeShply2Ogr(listShapelyShapes, outshpPTH, ogrWkbType=ogr.wkbPolygon):
drv = ogr.GetDriverByName("ESRI Shapefile")
ds = drv.CreateDataSource(os.path.split(outshpPTH)[0])
lyr = ds.CreateLayer(os.path.splitext(os.path.split(outshpPTH)[1])[0], None, ogrWkbType)
for shp in listShapelyShapes:
ogrGeo = ogr.CreateGeometryFromWkt(shp.wkt)
feat = ogr.Feature(lyr.GetLayerDefn())
feat.SetGeometry(ogrGeo)
lyr.CreateFeature(feat)
feat.Destroy()
def deleteOutput():
for shp in glob.glob(pthOutput+"*"):
os.remove(shp)
def runPolygonize():
lines = readOgr2Shply(pthTopo)
writeShply2Ogr(polys, pthOutput + os.path.split(pthTopo)[1])
def runLineMerge():
lines = readOgr2Shply(pthTopo)
polys = list(linemerge(lines))
writeShply2Ogr(polys, pthOutput + os.path.split(pthTopo)[1],ogr.wkbMultiLineString)
def writeSingleShape(listShapelyShapes, outshpPTH, ogrWkbType=ogr.wkbLineString):
drv = ogr.GetDriverByName("ESRI Shapefile")
ds = drv.CreateDataSource(outshpPTH)
lyr = ds.CreateLayer(os.path.splitext(os.path.split(outshpPTH)[1])[0], None, ogrWkbType)
for shp in listShapelyShapes:
ogrGeo = ogr.CreateGeometryFromWkt(shp.wkt)
feat = ogr.Feature(lyr.GetLayerDefn())
feat.SetGeometry(ogrGeo)
lyr.CreateFeature(feat)
feat.Destroy()
def runSplit():
deleteOutput()
splits = readOgr2Shply(pthSplitLines)
splits = splits[0]
poly = readOgr2Shply(pthMercer)
poly = poly[0]
intr = poly.intersection(splits)
diff = [poly.difference(intr.buffer(0.0001))]
print "\n\n"
print diff
print "\n\n"
writeShply2Ogr(diff,pthOutput + "POLYGON_SPLIT.shp", ogrWkbType=ogr.wkbMultiPolygon)
splits = readOgr2Shply(pthSplitLines)
splits = splits[0]
poly = readOgr2Shply(pthMercer)
poly = poly[0]
intr = poly.intersection(splits)
diff = list(polygonize(poly.difference(intr.buffer(0.0001))))
print "\n\n"
print diff
print "\n\n"
writeShply2Ogr(diff,pthOutput + "POLYGON_SPLIT_POLYGONIZER.shp")
See what I mean? That’s pretty useful. It did return me each Polygon that made up the MultiPolygon. Now the bad news. This is a cheap method. If you were going to use this output for analysis you’d have to be very careful. Each polygon doesn’t share an edge along the splits. We buffered our lines remember? If you zoom way in you’ll see this:
The better way to do this might be using QuadEdge data types or Topology classes in GEOS and JTS. Maybe this is the next step — an unwritten example 5. We’ll see.
For the real brainiacs out there, you could write your own algorithm that traverses the edge of one polygon and for each segments asks if the split lines intersect it, split them at intersecting points, save off segments and reconstruct polygons. But that’s a way harder make-my-brain-hurt maths geometry stuffs.