Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Archive for March, 2010

Optimizing PostGIS Geometry Functions– Mmm. Donut.

Posted by smathermather on March 6, 2010

Well, how can I PostGIS without referring back to BostonGIS.  One day I’ll know… .  But first, the Simpsons:  “Mmm… … Donut.”

A while back I read Regina Obe’s BostonGIS post on “Map Dicing,” essentially using a grid to dice up an existing dataset to a small granularity in order to take full advantage of spatial indexing, etc..  The post really intrigued me, and is often at the back of my mind as I’m working on doing heavy duty PostGIS queries (and waiting hours/days for them to complete).

There was a time when I worked a job (not long ago) and we would start data processes days before we expected to see them complete– sometimes 6-7 days.  So, generally, I don’t mind waiting (especially if I know something will likely complete).  Imagine my surprise at my own lack of patience then when I ran a couple of analyses in PostGIS, and they were still running 2 days later.  Well, I don’t want to be impatient, but I also don’t want to be inefficient!

A few posts back, I deleted some bad data from my contour dataset by chopping it out willy-nilly with an ST_Difference command.  It ran pretty quickly.  As in, overnight, which by most measures is fast enough.

So now, I’m trying to paste together the best of what I have for contours.  For our most urban county, I have 2ft contours derived from the combo of LiDAR and 3D breaklines– a beautiful (but PitA) dataset.  For the remaining 6 counties we want to map, I have 2 ft contours derived from (what I think is) a TIN interpolation of LiDAR points, that isn’t quite so detailed, or as effective at representing roads, cliffs, and waterways, but good enough for most purposes.  Sadly, the urban county sits surrounded by the other counties, and the bounds of the two datasets do not mesh perfectly.

In other words, I’ve got one cookie cutter and with it I want to cut a hole in one data set (our future donut), and trim another dataset to the same shape (our future donut hole), and then stick the two datasets together.

So using my two favorite cookie cutters, ST_Difference and ST_Intersection, I thought I’d use the county boundary of the urban area to remove all data within that county from the 7 county dataset (ST_Difference), and “clip” the urban county to the same strict boundary, stick the two datasets together, and wallah!

Problem is, it’s not as fast this time.  There’s two reasons for this.  On the dataset I am making into a donut, it is a dataset about 3 times as big.  So, naturally it should run more slowly.  On the dataset I’m making into the donut hole, however, the problem is with the inefficiency of meaningless bounding boxes– e.g. the bounding box of the cookie cutter is nearly as big as the full extent of the contour dataset, therefore, nearly all the tests against the bounding box as a filter to see whether there’s any likelihood of an intersection return a positive result, whether the contours in question are near the edge or not.

Enter “Map Dicing”.  I need smaller chunks of data.

Since I already have a grid I like to use for the county, I can skip Regina’s first step of creating a grid, and jump right into dicing up my cookie cutter:

CREATE TABLE county_cutter
 name character varying(15)

SELECT AddGeometryColumn('public', 'county_cutter', 'the_geom', 9102722, 'MULTIPOLYGON', 2);

INSERT INTO county_cutter(name,the_geom)
SELECT name,
 ST_multi(ST_Intersection(c.the_geom, cg.the_geom))
 FROM county c INNER JOIN county_grid cg ON ST_Intersects(c.the_geom, cg.the_geom);

CREATE INDEX county_cutter_the_geom_idx ON county_cutter USING gist(the_geom);

Now, I can get back to geo-processing.

CREATE TABLE public.cuy_contours_2_donut_hole
 SELECT elevation,
       ST_Intersection(a.the_geom, b.the_geom) AS the_geom
    FROM base.cuy_contours_2_1 as a,
         public.county_cutter as b;

CREATE TABLE public.contours_2_donut
 SELECT elevation,
       ST_Difference(a.the_geom, b.the_geom) AS the_geom
    FROM base.contours_2 as a,
         public.county_cutter as b;

Well, I should do this with a union statement in order to combine it into one table in one go.  I’ll revise before I run it, and report back whether it works.

Posted in Database, Database Optimization, PostGIS, SQL | Tagged: , , , , , | 1 Comment »