Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Archive for May, 2011

Contours– Structuring PostGIS data for viewing with GeoServer

Posted by smathermather on May 25, 2011

Naively structured data is my bane– the desire (and need) to get stuff done so often overtakes the time needed to do things the better way. So, we bootstrap.

A long time ago, we managed to load in a few tens of gigs of contour data into PostGIS, partitioned it into 2ft, 10ft, 20ft, 50ft, 100ft and 250ft tables using select queries with a modulus operator, e.g.


CREATE TABLE base.cuy_contours_10
	AS
	SELECT elevation, the_geom
		FROM base.cuy_contours_2
		WHERE base.cuy_contours_2.elevation % 10 = 0;

And then we built SLD‘s to display the data in GeoServer and formed the data into layer groups, and away we went… .

However, layer groups can often be replaced by properly structured PostGIS tables. For large (and homogenous) datasets like this, it’s the only way to go. In addition, properly structuring this allows us to take advantage of GeoServer’s ability to modify the legend based on zoom extent, which for representing contours at a range of scales is an almost “must have”.

To structure the table, we could be disciplined and build this out as a proper normalized relational dataset where our gid is used to determine if a contour is divisible by 10, 20, 50 etc., and while “I don’t wanna” isn’t a good enough reason not to do this, I think the computational overhead of a database view piecing these data back into a single table each time we need to access this would not be justified in the light of the static nature of the table. So database normalization be darned, disk space is cheap, full speed ahead. Let’s add some boolean fields for flagging whether a contour is divisible by our numbers and calculate that out:


UPDATE base.contours_2
	SET div_10 = CAST( contours_2.elevation % 10 AS BOOLEAN );


UPDATE base.contours_2
	SET div_250 = CAST( contours_2.elevation % 250 AS BOOLEAN );

yields the following (with highlights added):

Sort of the opposite of what I intended, the “falses” maybe should be “trues” and vice versa, but hey, it’ll work anyway. BTW, we did test doing this calculation a few different ways, and while I can’t remember all the details, doing this as a calculation instead of doing an update query with a while statement testing the modulus was much faster (3x faster).

Ok, so now we style an sld for it, and sit back and enjoy (pics later…).

Posted in Database, Database Optimization, GeoServer, PostGIS, SQL | Tagged: , , , , , , , , , | Leave a Comment »

SLD for contour data

Posted by smathermather on May 25, 2011

See other post for explanation:

<?xml version="1.0" encoding="ISO-8859-1"?>
<StyledLayerDescriptor version="1.0.0" xmlns="http://www.opengis.net/sld"

xmlns:ogc="http://www.opengis.net/ogc"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/sld

http://schemas.opengis.net/sld/1.0.0/StyledLayerDescriptor.xsd">
<NamedLayer>
<Name>contours</Name>
<UserStyle>
<Title>contours</Title>
<Abstract>Contour lines with index</Abstract>
<FeatureTypeStyle>

<Rule>

<Name>rule01</Name>
<Title>2 ft contours</Title>
<Abstract>Abstract</Abstract>

<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>div_10</ogc:PropertyName>
<ogc:Literal>1</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>

<MinScaleDenominator>1</MinScaleDenominator>
<MaxScaleDenominator>4799</MaxScaleDenominator>

<LineSymbolizer>
<Stroke>
<CssParameter name="stroke">
<ogc:Literal>#ffd700</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linecap">
<ogc:Literal>butt</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linejoin">
<ogc:Literal>miter</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-width">
<ogc:Literal>1</ogc:Literal>
</CssParameter>

</Stroke>
</LineSymbolizer>
</Rule>

<Rule>

<Name>rule02</Name>
<Title>10 ft contours index</Title>
<Abstract>Abstract</Abstract>

<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>div_10</ogc:PropertyName>
<ogc:Literal>0</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>

<MinScaleDenominator>1</MinScaleDenominator>
<MaxScaleDenominator>4799</MaxScaleDenominator>

<LineSymbolizer>
<Stroke>
<CssParameter name="stroke">
<ogc:Literal>#D95F02</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linecap">
<ogc:Literal>butt</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linejoin">
<ogc:Literal>miter</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-width">
<ogc:Literal>1</ogc:Literal>
</CssParameter>

</Stroke>
</LineSymbolizer>
</Rule>

<Rule>

<Name>rule03</Name>
<Title>10 ft contours</Title>
<Abstract>Abstract</Abstract>

<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>div_10</ogc:PropertyName>
<ogc:Literal>0</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>

<MinScaleDenominator>4800</MinScaleDenominator>
<MaxScaleDenominator>7199</MaxScaleDenominator>

<LineSymbolizer>
<Stroke>
<CssParameter name="stroke">
<ogc:Literal>#ffd700</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linecap">
<ogc:Literal>butt</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linejoin">
<ogc:Literal>miter</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-width">
<ogc:Literal>1</ogc:Literal>
</CssParameter>

</Stroke>
</LineSymbolizer>
</Rule>

<Rule>

<Name>rule04</Name>
<Title>50 foot contours index</Title>
<Abstract>Abstract</Abstract>

<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>div_50</ogc:PropertyName>
<ogc:Literal>0</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>

<MinScaleDenominator>4800</MinScaleDenominator>
<MaxScaleDenominator>7199</MaxScaleDenominator>

<LineSymbolizer>
<Stroke>
<CssParameter name="stroke">
<ogc:Literal>#D95F02</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linecap">
<ogc:Literal>butt</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linejoin">
<ogc:Literal>miter</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-width">
<ogc:Literal>1</ogc:Literal>
</CssParameter>

</Stroke>
</LineSymbolizer>
</Rule>

<Rule>

<Name>rule05</Name>
<Title>20 ft contours</Title>
<Abstract>Abstract</Abstract>

<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>div_20</ogc:PropertyName>
<ogc:Literal>0</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>

<MinScaleDenominator>7200</MinScaleDenominator>
<MaxScaleDenominator>20999</MaxScaleDenominator>

<LineSymbolizer>
<Stroke>
<CssParameter name="stroke">
<ogc:Literal>#ffd700</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linecap">
<ogc:Literal>butt</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linejoin">
<ogc:Literal>miter</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-width">
<ogc:Literal>1</ogc:Literal>
</CssParameter>

</Stroke>
</LineSymbolizer>
</Rule>

<Rule>

<Name>rule06</Name>
<Title>100 foot contours index</Title>
<Abstract>Abstract</Abstract>

<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>div_100</ogc:PropertyName>
<ogc:Literal>0</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>

<MinScaleDenominator>7200</MinScaleDenominator>
<MaxScaleDenominator>20999</MaxScaleDenominator>

<LineSymbolizer>
<Stroke>
<CssParameter name="stroke">
<ogc:Literal>#D95F02</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linecap">
<ogc:Literal>butt</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linejoin">
<ogc:Literal>miter</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-width">
<ogc:Literal>1</ogc:Literal>
</CssParameter>

</Stroke>
</LineSymbolizer>
</Rule>

<Rule>

<Name>rule07</Name>
<Title>50 ft contours</Title>
<Abstract>Abstract</Abstract>

<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>div_50</ogc:PropertyName>
<ogc:Literal>0</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>

<MinScaleDenominator>21000</MinScaleDenominator>
<MaxScaleDenominator>100000</MaxScaleDenominator>

<LineSymbolizer>
<Stroke>
<CssParameter name="stroke">
<ogc:Literal>#ffd700</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linecap">
<ogc:Literal>butt</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linejoin">
<ogc:Literal>miter</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-width">
<ogc:Literal>1</ogc:Literal>
</CssParameter>

</Stroke>
</LineSymbolizer>
</Rule>

<Rule>

<Name>rule08</Name>
<Title>250 foot contours index</Title>
<Abstract>Abstract</Abstract>

<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>div_250</ogc:PropertyName>
<ogc:Literal>0</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>

<MinScaleDenominator>21000</MinScaleDenominator>
<MaxScaleDenominator>100000</MaxScaleDenominator>

<LineSymbolizer>
<Stroke>
<CssParameter name="stroke">
<ogc:Literal>#D95F02</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linecap">
<ogc:Literal>butt</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-linejoin">
<ogc:Literal>miter</ogc:Literal>
</CssParameter>
<CssParameter name="stroke-width">
<ogc:Literal>1</ogc:Literal>
</CssParameter>

</Stroke>
</LineSymbolizer>
</Rule>

</FeatureTypeStyle>
</UserStyle>
</NamedLayer>
</StyledLayerDescriptor>

Posted in GeoServer | Tagged: , , , | 1 Comment »

PostgreSQL and Fuzzy Hashes– a natural combo?

Posted by smathermather on May 20, 2011

A while back, I had the idea of creating a hash of geometry in order to create a unique constraint on geometry to avoid duplicate geometries in a PostGIS table. It works. It’s slow, but it works. I haven’t used it since.

What I really wanted to do though was be able to detect if geometries were similar as well (not just detect identical geometries), to test if geometries are homologous.  Wouldn’t it be nice to know how similar two shapes are?

A couple of nights ago, I was introduced to a technique that may just do that– fuzzy hashes.  Now a hash is a clever way to convert any and everything (in a computer), regardless of size, into the same size number, say 128-byte integer which is unique to the particular input.  This has all sorts of uses, from identifying viruses, to indexing information, to ensuring that when you download something from the internet, you are getting what you thought you were downloading in an unmodified form.  The unique property of a hash is that, make a small change to the input, and the output hash is completely different from the hash of unchanged input.  If you want to know if two things are similar, a hash won’t help.  In fact, hashes of bad files are used in forensics, and work-arounds to hash-based forensics can be as simple as changing a single byte in a file.

Enter fuzzy hashes.  In short, fuzzy hashes break up a file into logical pieces or blocks, hash each block, and then similarity between files can be determined block-by-block.  This is useful for file forensics, anti-virus, determining the origins of nefarious code, detecting plagiarism etc.  In long, well… just read this.  Also, check out ssdeep.

The question is, can we use this the geographic sphere.  How would this be useful in, e.g. PostGIS?

Let’s give it a try:

Line geometry-- original

If we convert our purple line above to text in PostGIS:

SELECT ST_AsEWKT(the_geom) FROM trail_test;

we get something like this:

"SRID=3734;MULTILINESTRING((2179985.47870642 567446.852705703 1030.80067171231 -1.79769313486232e+308,2179982.97870642 567445.252427514 1030.79452346621 -1.79769313486232e+308,2179980.47870642 567443.652149326 1030.74616711558 -1.79769313486232e+308,2179977.97870642 567442.051871137 1030.61640858078 -1.79769313486232e+308,2179975.47870642 567440.451592949 1030.50568889286

-1.79769313486232e+308,2179794.15034441 566393.137591605 1051.18052872389 -1.79769313486232e+308,2179794.04164876 566390.637591605 1051.46559305229 -1.79769313486232e+308,2179793.93295311 566388.137591605 1051.74478920181 -1.79769313486232e+308,2179793.82425746 566385.637591605 1052.01504940301 -1.79769313486232e+308,2179793.73503707 566383.585522703 1052.23441454273 -1.79769313486232e+308))"

Which fuzzy hashed looks like this:

# ssdeep -b temp
ssdeep,1.0--blocksize:hash:hash,filename
384:RpLQqsmoBbzjvPW7KEhoAyoGchYqpmeYOB7LCfCMJrOevT1WLwR7UyYIU:7Eq5oBbzjvPWuAyoGcSqIUNLCfCMJ6eq,"temp"

If we change some vertices and rerun, but this time change the ssdeep command to tell us similarity:

Original Geometry

Modified Geometry


Create a hash of the first geometry and write it to file:
# ssdeep -b temp > temphash

then to compare the hash of the second file:
# ssdeep -bm temphash temp1
temp1 matches temp (99)

Not surprisingly, they are 99% similar.
Now this is a painful way to do this comparison. However, imagine a function in the PostGIS suite wherein you could do a geometry-by-geometry similarity comparison for tables. Imagine combining that analysis with a snap-to-grid function to further generalize it’s applicability.

Any thoughts on uses? I know I’d use it to determine whether two data sources were related, i.e. were modified from the same parent, but I’m guessing there are some other more subtle applications.

Posted in PostGIS | Tagged: , , , , | 2 Comments »

The PostGIS learning Curve: ST_Within and ST_Crosses

Posted by smathermather on May 17, 2011

In an earlier post, I was frustrated with the intersection of a polygon and line datasets coming out as a geometry collection instead of a simple line dataset as expected. Nicklas Avén was kind enough to point out that this is a natural consequence of the logic of an intersection, the definition of ST_Intersects is: Returns TRUE if the Geometries/Geography “spatially intersect” – (share any portion of space) and FALSE if they don’t (they are Disjoint). Any portion of space, being the critical phrase. So when lines intersect the edges of the polygon, that too is an intersection and is included. How do we avoid this? First I tried this:


CREATE TABLE public.contour_2_cut AS
	SELECT contour.elevation, ST_Intersection(contour.the_geom, cutline) AS the_geom, gid FROM
		public.cuy_contours_2 AS contour, ST_GeomFromText('POLYGON((2136548 635811, 2137743 635811, 2137743 635010, 2136548 635010,2136548 635811))', 3734) AS cutline
	WHERE ST_Overlaps(contour.the_geom, cutline) OR ST_Within(contour.the_geom, cutline);

Which for a dataset like this:

Results in this:

Hmm, something isn’t quite right. That something is that the contours that we’re clipping lay partially outside the frame we’re clipping– and these ones don’t pass either test, st_overlaps or st_within (although, to be honest, I would have expected them to pass the st_overlaps query).

If instead, I change the query such that it tests whether it is within or crosses my polygon, thus:


CREATE TABLE public.contour_2_cut_4 AS
	SELECT contour.elevation, ST_Intersection(contour.the_geom, cutline) AS the_geom, gid FROM
		public.cuy_contours_2 AS contour, ST_GeomFromText('POLYGON((2136548 635811, 2137743 635811, 2137743 635010, 2136548 635010,2136548 635811))', 3734) AS cutline
	WHERE ST_Within(contour.the_geom, cutline) OR ST_Crosses(contour.the_geom, cutline);

I get the correct output:

Posted in Database, PostGIS, SQL | Tagged: , , , | Leave a Comment »

Civic Commons

Posted by smathermather on May 12, 2011

Just a plug for Civic Commons, a network dedicated to:

“Let’s Save the Public Millions of Dollars

“Government entities at all levels face substantial and similar IT challenges, but today, each must take them on independently. Why can’t they share their technology, eliminating redundancy, fostering innovation, and cutting costs? We think they can. Civic Commons helps government agencies work together. ”

I haven’t used Civic Commons yet, but sharing code, projects, and standards across the government sector is a laudable objective.

Posted in Other | Leave a Comment »

GeoServer and efficient delivery of raster data (image pyramid layer)

Posted by smathermather on May 12, 2011

One thing I’ve learned in the last few years is that there is no theoretical reason why (properly indexed and summarized) data cannot be displayed at all scales as quickly as at any scale. This is the principle at work behind the extraordinary efficiencies of delivering data and imagery through a slippy map interface like Google, Bing, and OpenLayers as well as efficient thick client interfaces like Google Earth. So, in principle, and largely in practice, serving spatial data for a whole county or the whole world shouldn’t be any more onerous than serving data for a particular site, so long as you have adequate storage for the pre-rendered and summarized data, and time to pre-render the data. As storage tends to be cheaper than processing and network speed, this is a no-brainer.

A number of great Open Source tools exist to help with serving large amounts of data efficiently, not the least of which is my favorite, GeoServer (paired with GeoWebCache). For serving imagery, in our case orthorectified 0.6-inch (0.1524 meter) aerial imagery, we have a few options. GeoServer does natively support GeoTiff, but for this large an area at this level of detail, we’d have to wade into the realm of BigTiff support through the GDAL extension, because we have 160GB imagery to serve. We could use wavelet compressed imagery, e.g. MrSid or ECW or Jpeg2000, but I don’t have a license to create a lossless version of these, and besides, storage is cheaper than processors– wavelet compressed imagery may be a good field solution, but for server side work, it doesn’t make a lot of sense unless it’s all you have available. Finally, there are two data source extensions to GeoServer meant for large imagery, the ImageMosaic Plugin, and the ImagePyramid Plugin. The ImageMosaic Plugin works well for serving large amounts of images, and has some great flexibility with respect to handling transparency and image overlap. The ImagePyramid extension is tuned for serving imagery at many scales. The latter is what we chose to deploy.

The ImagePyramid extension takes advantage of gdal_retile.py, a utility built as part of GDAL that takes an image or set of images and re-tiles them to a standardized size (e.g. 2048×2048) and creates overviews as separate images in a hierachy (here shown as outlines of the images):

But here’s the problem– for some reason, I can’t load all the images at once. If I do, only the low resolution pyramids (8-foot pixels and larger) load. If I break the area into smaller chunks, most of them fewer than 2000 images, they load fine.

1" = 50' scale snapshot


1:32,000 scale snapshot

Posted in GeoServer, GeoWebCache | Tagged: , , , , | 5 Comments »