I ran these tests a while back, but I’m just getting to post them now. These were run on a late model MacBook Pro, with a 2.4GHz Intel Core 2 Duo, with a couple of gigs of RAM on PostgreSQL. In Postgre, if I run
SELECT version();
I get:
PostgreSQL 8.4.1 on i386-apple-darwin10.0.0, compiled by GCC i686-apple-darwin10-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5493), 64-bit
My hard drive is as follows:
Vendor: NVidia Product: MCP79 AHCI Speed: 3 Gigabit Description: AHCI Version 1.20 Supported
Ok. So enough prelude. Back in New Year’s day post, I used the nearest neighbor function posted on the Boston GIS page to find the nearest ground point to one of my vegetation points in an attempt to derived vegetation height within a LiDAR dataset loaded into PostGIS. I got as far as identifying the nearest neighbor, and then stopped. What next? Well, how fast (or slow) is that nearest neighbor search?
Vinton=# EXPLAIN ANALYZE SELECT pgis_fn_nn(b.the_geom, 10,1,1,'lidar_ground','true','vnum','the_geom') FROM lidar_veg AS b WHERE b.vnum > 576505 AND b.vnum < 576600; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Index Scan using lidar_veg_pkey on lidar_veg b (cost=0.00..108.97 rows=31 width=100) (actual time=571.154..28589.303 rows=50 loops=1) Index Cond: ((vnum > 576505) AND (vnum < 576600)) Total runtime: 28589.398 ms (3 rows)
I also timed it with a stop watch. No matter how I changed the search parameters for the nearest neighbor function, it took almost exactly 35 seconds. Extrapolating upward, that means that to do this for one of my blocks of data would take an excess of 50 hours. I have 559 of these blocks of data, so we’re looking at ~28,000 hours of computing time, or about 3.2 years. It’s my wife’s laptop, and I don’t think I should take it over for that long… .
So, with no real knowledge of how to optimize things, I’m thinking instead of a nearest neighbor approach, perhaps a point-in-polygon analysis with Voronoi polygons created from my LiDAR ground points. I can use R to generate my polygons, ala (yet again) Boston GIS, and then ST_Contains to do the spatial join. Hopefully by already having the geometry constructed with R, it will be a lot more efficient that expanding buffers from points.
One note– the 35 seconds was in a test for 50 points, not 1 point. Looking back over my notes I remembered this. But, my estimates were based on this number, so it would still take 3.2 years, as currently structured.
Did you run a ‘create spatial index’ function on the lidar data before the NN search? Also, “ST_DWithin”, as far as I understand it, is more optimized internally than “ST_Contains” which might be a better initial query; it uses a rectangular query initially so there’s no sqrt function being called until the simple bounding box query has been resolved.
Hi Wes,
I thought I had created a spatial index, but in reviewing my notes, I noticed some typos… see https://smathermather.wordpress.com/2010/02/07/postgis-and-lidar-oops/ . The nearest neighbor approach I used employs Boston GIS’ algorithm http://www.bostongis.com/?content_name=postgis_nearest_neighbor_generic , which if I read it correctly uses expanding bounding boxes in order to be fast. With spatial indices, the query is more than 200 times faster. Add some CLUSTERing on the spatial index, and it’s 500 times faster than raw analysis.