Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Archive for February, 2010

Contour data and table management in PostGIS

Posted by smathermather on February 23, 2010

My contour dataset is almost ready to go live.  Just a few more tweaks.  I was manipulating it in the public schema.  I really want it in my base map schema “base”.

Want to change schemas on a table in Postgre? No problem:

ALTER TABLE public.cuy_contours_2_clip SET SCHEMA base;

Rename a table? No problem:

ALTER TABLE base.cuy_contours_2_clip RENAME TO base.cuy_contours_2;

Save space by not storing higher precision data than you need to (e.g. NUMERIC data for what should be an integer elevation):

ALTER TABLE base.cuy_contours_2
 ALTER COLUMN elevation TYPE smallint;

Now I want pre-built queries that store the two foot contour data as only the 10 foot, 20 foot, 50 foot, etc..  These different versions of the dataset will then have zoom factors applied to them when I serve them up in GeoServer.  To achieve this, we use Postgre’s modulus operator with WHERE statement, e.g.:

CREATE TABLE base.cuy_contours_250
 AS
 SELECT elevation, the_geom
 FROM base.cuy_contours_2
 WHERE base.cuy_contours_2.elevation % 250 = 0;

And the rest:

CREATE TABLE base.cuy_contours_10
 AS
 SELECT elevation, the_geom
 FROM base.cuy_contours_2
 WHERE base.cuy_contours_2.elevation % 10 = 0;
CREATE TABLE base.cuy_contours_20
 AS
 SELECT elevation, the_geom
 FROM base.cuy_contours_2
 WHERE base.cuy_contours_2.elevation % 20 = 0;
CREATE TABLE base.cuy_contours_50
 AS
 SELECT elevation, the_geom
 FROM base.cuy_contours_2
 WHERE base.cuy_contours_2.elevation % 50 = 0;
CREATE TABLE base.cuy_contours_100
 AS
 SELECT elevation, the_geom
 FROM base.cuy_contours_2
 WHERE base.cuy_contours_2.elevation % 100 = 0;

Posted in Database, Database Optimization, PostGIS, SQL | Tagged: , , , , | 1 Comment »

YouTube Canopy Video

Posted by smathermather on February 9, 2010

Earlier, I posted some simple pictures from this short video clip using LiDAR to determine vegetation shape, animated in PovRay. Now, I post the video itself (all 7 seconds):

I did what YouTube says not to do, and that is to upload a vignetted version of video. Such is. I’ll upload a better one shortly eventually.

Posted in POV-Ray | Tagged: , , , | Leave a Comment »

PostGIS and LiDAR– oops!

Posted by smathermather on February 7, 2010

I won’t offer much prelude.  Read this post, and this post first… .

Inadvertently I demonstrated the value of spatial indices, i.e. I meant to use them and didn’t.  In attempting to sort my tables by their spatial index I got the following errors:

ALTER TABLE lidar_ground CLUSTER ON lidar_ground_the_geom_idx;
 ERROR:  "lidar_ground_the_geom_idx" is not an index for table "lidar_ground"
ALTER TABLE lidar_veg CLUSTER ON lidar_veg_the_geom_idx;
 ERROR:  "lidar_veg_the_geom_idx" is not an index for table "lidar_veg"

Looking further, I saw there were no spatial indices on either table.

Returning to my original post, I had typos:

CREATE INDEX lidar_veg_the_geom_idx ON lidar USING gist(the_geom);
CREATE INDEX lidar_ground_the_geom_idx ON lidar USING gist(the_geom);

Do you see it?  It should be

CREATE INDEX lidar_veg_the_geom_idx ON lidar_veg USING gist(the_geom);
CREATE INDEX lidar_ground_the_geom_idx ON lidar_ground USING gist(the_geom);

I created two identical indices on the lidar table, not on the two tables I was interested in.  So, what was the relative performance hit?

Without the indices:

Vinton=# EXPLAIN ANALYZE SELECT pgis_fn_nn(b.the_geom, 10,1,1,'lidar_ground','true','vnum','the_geom') FROM lidar_veg AS b WHERE b.vnum > 576505 AND b.vnum < 576600;
 QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using lidar_veg_pkey on lidar_veg b  (cost=0.00..108.97 rows=31 width=100) (actual time=571.154..28589.303 rows=50 loops=1)
 Index Cond: ((vnum > 576505) AND (vnum < 576600))
 Total runtime: 28589.398 ms
(3 rows)

With the indices:

EXPLAIN ANALYZE SELECT pgis_fn_nn(b.the_geom, 10,1,1,'lidar_ground','true','vnum','the_geom') FROM lidar_veg AS b WHERE b.vnum > 576505 AND b.vnum < 576600;
 QUERY PLAN                                                       

------------------------------------------------------------------------------------------------------------------------------
-------
 Index Scan using lidar_veg_pkey on lidar_veg b  (cost=0.00..108.97 rows=31 width=100) (actual time=86.489..138.579 rows=50 lo
ops=1)
 Index Cond: ((vnum > 576505) AND (vnum < 576600))
 Total runtime: 138.717 ms
(3 rows)

Now with CLUSTERing:

ALTER TABLE lidar_veg CLUSTER ON lidar_veg_the_geom_idx;
ALTER TABLE lidar_ground CLUSTER ON lidar_ground_the_geom_idx;

EXPLAIN ANALYZE SELECT pgis_fn_nn(b.the_geom, 10,1,1,'lidar_ground','true','vnum','the_geom') FROM lidar_veg AS b WHERE b.vnum > 576505 AND b.vnum < 576600;
 QUERY PLAN                                                        

------------------------------------------------------------------------------------------------------------------------------
-----
 Index Scan using lidar_veg_pkey on lidar_veg b  (cost=0.00..108.97 rows=31 width=100) (actual time=1.929..56.285 rows=50 loop
s=1)
 Index Cond: ((vnum > 576505) AND (vnum < 576600))
 Total runtime: 56.378 ms
(3 rows)

Indices sped up the query by a factor of 206 times.  Adding the CLUSTER sped it up another 2.4 times over and above the indices for a total of a query that runs 507 times faster.  Did I say the full query would take a few years?  Ahem, I meant a couple days.  Specifically about 2.

Posted in Database, Database Optimization, PostGIS, SQL | Tagged: , , , , , , , | Leave a Comment »

Rethinking PostGIS Analyses– Remembering to CLUSTER

Posted by smathermather on February 6, 2010

Something’s been nagging at me as far as the level of optimization (or lack there-of) that I did to my database before my other posts of using PostGIS to analyze LiDAR data (e.g. this post).  It seemed my results were remarkably slow, but I couldn’t put my finger on why that was problematic.

Then, as I was testing a smaller lidar dataset in order to do a shading analysis for my future garden, I noticed that with those 1800 points (a high density point cloud considering I live on a quarter-acre), the nearest-neighbor algorithm, per point, was much faster than when applied to the larger dataset I was testing before.  On one hand this seems reasonable.  A bigger dataset takes longer to process.  But, properly optimized, a large dataset should not take significantly longer per record.  We want linear, or near linear, scaling of processing.  When we don’t, in this case, it indicates that the data are not structured in a way that allows for fast access to records of interest.

Re-ordering records (PostgreSQL CLUSTER) to reflect the hierarchy of an index is a good way to optimize a database, assuming you know which index is your most important one to use to organize your data structure.  In the case of large spatial datasets, this is usually the spatial index.

In this example, I will apply CLUSTERing to the PostGIS database, clustering on the spatial index in order to optimize spatial queries.

ALTER TABLE lidar_ground CLUSTER ON lidar_ground_the_geom_idx;
ALTER TABLE lidar_veg CLUSTER ON lidar_veg_the_geom_idx;

I will report back with some results.  Hopefully this helps.

Posted in Database, Database Optimization, PostGIS, SQL | Tagged: , , , , , , | Leave a Comment »

PostGIS and LiDAR, the saga continues

Posted by smathermather on February 1, 2010

I ran these tests a while back, but I’m just getting to post them now.  These were run on a late model MacBook Pro, with a 2.4GHz Intel Core 2 Duo, with a couple of gigs of RAM on PostgreSQL.  In Postgre, if I run

SELECT version();

I get:

PostgreSQL 8.4.1 on i386-apple-darwin10.0.0, compiled by GCC i686-apple-darwin10-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5493), 64-bit

My hard drive is as follows:

Vendor:    NVidia
 Product:    MCP79 AHCI
 Speed:    3 Gigabit
 Description:    AHCI Version 1.20 Supported

Ok.  So enough prelude.  Back in New Year’s day post, I used the nearest neighbor function posted on the Boston GIS page to find the nearest ground point to one of my vegetation points in an attempt to derived vegetation height within a LiDAR dataset loaded into PostGIS.  I got as far as identifying the nearest neighbor, and then stopped.  What next?  Well, how fast (or slow) is that nearest neighbor search?

Vinton=# EXPLAIN ANALYZE SELECT pgis_fn_nn(b.the_geom, 10,1,1,'lidar_ground','true','vnum','the_geom') FROM lidar_veg AS b WHERE b.vnum > 576505 AND b.vnum < 576600;
 QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using lidar_veg_pkey on lidar_veg b  (cost=0.00..108.97 rows=31 width=100) (actual time=571.154..28589.303 rows=50 loops=1)
 Index Cond: ((vnum > 576505) AND (vnum < 576600))
 Total runtime: 28589.398 ms
(3 rows)

I also timed it with a stop watch.  No matter how I changed the search parameters for the nearest neighbor function, it took almost exactly 35 seconds.  Extrapolating upward, that means that to do this for one of my blocks of data would take an excess of 50 hours.  I have 559 of these blocks of data, so we’re looking at ~28,000 hours of computing time, or about 3.2 years.  It’s my wife’s laptop, and I don’t think I should take it over for that long… .

So, with no real knowledge of how to optimize things, I’m thinking instead of a nearest neighbor approach, perhaps a point-in-polygon analysis with Voronoi polygons created from my LiDAR ground points.  I can use R to generate my polygons, ala (yet again) Boston GIS, and then ST_Contains to do the spatial join.  Hopefully by already having the geometry constructed with R, it will be a lot more efficient that expanding buffers from points.

Posted in Database, Database Optimization, PostGIS, SQL | Tagged: , , , , | 4 Comments »