Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

PostGIS and LiDAR, the saga completes?

Posted by smathermather on April 6, 2010

I have to thank this post for the code that follows http://postgis.refractions.net/pipermail/postgis-users/2009-June/023660.html:

I’ve been putting at this nearest neighbor for points problem for a few months now, never dedicating enough time to think in through, nor expending enough energy to be really great at SQL, let alone how it can be applied in spatial operations with an OGC compliant database like PostGIS.  But here’s my naive nearest neighbor solution.  It’s my solution for getting vegetation height from a set of vegetation points and a set of ground points, basically taking the nearest ground point, and subtracting the z value of the ground from the z value of the vegetation.

The points themselves, while attributed with x, y, and z, are in fact 2D point sets, so the nearest point is not in the 3D domain, but the nearest point horizontally.  I did this on purpose– it reduces errors in areas with steep slopes, and has less computational overhead.

SELECT DISTINCT ON(g1.gid)  g1.z - g2.z as height
 FROM veg As g1, ground As g2   
 WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 15)   
 ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom);

height   
-----------
 0.00000
 33.80000
 0.52000
 0.00000
 0.00000
 0.00000
 10.79000
 11.55000
 6.92000
 7.61000
 22.54000
 50.16000
 50.62000
 50.96000
 45.93000

It’d be nice to know which points are which, and where they are at as well, so we’ll grab the gid for each point, as well as the value for x, y, and z:

SELECT DISTINCT ON(g1.gid)  g1.gid as gid, g2.gid as gid_ground, g1.x as x, g1.y as y, g1.z as z, g1.z - g2.z as height, the_geom g1.the_geom as geometry
 FROM veg As g1, ground As g2   
 WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 15)   
 ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom);

How efficient is this?

EXPLAIN ANALYZE SELECT DISTINCT ON(g1.gid)  g1.gid As gid, g2.gid As gid_ground, g1.x as x, g1.y as y, g1.z as z, g1.z - g2.z as height, the_geom as geometry
 FROM veg As g1, ground As g2
 WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 15)
 ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom);

Unique  (cost=1347373.60..1347373.61 rows=1 width=225) (actual time=6342.041..6358.930 rows=1882 loops=1)
->  Sort  (cost=1347373.60..1347373.60 rows=1 width=225) (actual time=6342.039..6353.324 rows=42239 loops=1)
Sort Key: g1.gid, (st_distance(g1.the_geom, g2.the_geom))
Sort Method:  external merge  Disk: 1704kB
->  Nested Loop  (cost=44.80..1347373.59 rows=1 width=225) (actual time=0.057..6267.593 rows=42239 loops=1)
Join Filter: ((g1.gid <> g2.gid) AND (g1.the_geom && st_expand(g2.the_geom, 15::double precision)) AND (g2.the_geom && st_expand(g1.the_geom, 15::double precision)) AND _st_dwithin(g1.the_geom, g2.the_geom, 15::double precision))
->  Seq Scan on ground g2  (cost=0.00..57.22 rows=2522 width=113) (actual time=0.011..0.927 rows=2522 loops=1)
->  Materialize  (cost=44.80..63.71 rows=1891 width=112) (actual time=0.000..0.170 rows=1891 loops=2522)
->  Seq Scan on veg g1  (cost=0.00..42.91 rows=1891 width=112) (actual time=0.006..0.634 rows=1891 loops=1)

Total runtime: 6392.724 ms

(10 rows)

Oh, yeah. We need some indexes.

CREATE INDEX veg_the_geom_idx ON veg USING gist(the_geom);
CREATE INDEX ground_the_geom_idx ON ground USING gist(the_geom);

 

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=1266.33..1266.33 rows=1 width=225) (actual time=378.057..393.715 rows=1882 loops=1)
->  Sort  (cost=1266.33..1266.33 rows=1 width=225) (actual time=378.055..387.902 rows=42239 loops=1)
Sort Key: g1.gid, (st_distance(g1.the_geom, g2.the_geom))
Sort Method:  external sort  Disk: 1712kB
->  Nested Loop  (cost=0.00..1266.32 rows=1 width=225) (actual time=0.127..302.111 rows=42239 loops=1)
Join Filter: ((g1.gid <> g2.gid) AND (g1.the_geom && st_expand(g2.the_geom, 15::double precision)) AND _st_dwithin(g1.the_geom, g2.the_geom, 15::double precision))
->  Seq Scan on veg g1  (cost=0.00..42.91 rows=1891 width=112) (actual time=0.014..0.490 rows=1891 loops=1)
->  Index Scan using ground_the_geom_idx on ground g2  (cost=0.00..0.37 rows=1 width=113) (actual time=0.025..0.065 rows=29 loops=1891)
Index Cond: (g2.the_geom && st_expand(g1.the_geom, 15::double precision))
Total runtime: 410.327 ms

(10 rows)

Whew! That’s a bit faster. Will clustering help us this time?

ALTER TABLE veg CLUSTER ON veg_the_geom_idx;
ALTER TABLE ground CLUSTER ON ground_the_geom_idx;

 

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=1266.33..1266.33 rows=1 width=225) (actual time=377.542..393.175 rows=1882 loops=1)
->  Sort  (cost=1266.33..1266.33 rows=1 width=225) (actual time=377.541..387.374 rows=42239 loops=1)
Sort Key: g1.gid, (st_distance(g1.the_geom, g2.the_geom))
Sort Method:  external sort  Disk: 1712kB
->  Nested Loop  (cost=0.00..1266.32 rows=1 width=225) (actual time=0.065..302.340 rows=42239 loops=1)
Join Filter: ((g1.gid <> g2.gid) AND (g1.the_geom && st_expand(g2.the_geom, 15::double precision)) AND _st_dwithin(g1.the_geom, g2.the_geom, 15::double precision))
->  Seq Scan on veg g1  (cost=0.00..42.91 rows=1891 width=112) (actual time=0.006..0.459 rows=1891 loops=1)
->  Index Scan using ground_the_geom_idx on ground g2  (cost=0.00..0.37 rows=1 width=113) (actual time=0.025..0.065 rows=29 loops=1891)
Index Cond: (g2.the_geom && st_expand(g1.the_geom, 15::double precision))
Total runtime: 409.784 ms
(10 rows)

Eh, not much. But this is a pretty small dataset. We’ll see later how it helps on a larger dataset.

And the relavent stats:

These were run on a tortured late model MacBook Pro, with a 2.4GHz Intel Core 2 Duo, with a couple of gigs of RAM on PostgreSQL.

SELECT version();
 version                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.2 on i386-apple-darwin10.2.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1), 64-bit
(1 row)

and PostGIS 1.5.1.

I promised a test in this post to find out whether my supposition that I have a dense enough LiDAR point cloud precludes the need for doing the difference with an interpolated ground point cloud, vs. the raw point cloud as I’m using it here.  I haven’t forgotten… .  That’s the next test, I think.

2 Responses to “PostGIS and LiDAR, the saga completes?”

  1. […] thing that I didn’t do two posts ago was to spend any time deciding how far out the search for nearby points with ST_DWithin would […]

  2. […] The busiest day of the year was November 11th with 68 views. The most popular post that day was PostGIS and LiDAR, the saga completes?. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: