# PostGIS and LiDAR, the saga completes?

I have to thank this post for the code that follows http://postgis.refractions.net/pipermail/postgis-users/2009-June/023660.html:

I’ve been putting at this nearest neighbor for points problem for a few months now, never dedicating enough time to think in through, nor expending enough energy to be really great at SQL, let alone how it can be applied in spatial operations with an OGC compliant database like PostGIS.  But here’s my naive nearest neighbor solution.  It’s my solution for getting vegetation height from a set of vegetation points and a set of ground points, basically taking the nearest ground point, and subtracting the z value of the ground from the z value of the vegetation.

The points themselves, while attributed with x, y, and z, are in fact 2D point sets, so the nearest point is not in the 3D domain, but the nearest point horizontally.  I did this on purpose– it reduces errors in areas with steep slopes, and has less computational overhead.

```SELECT DISTINCT ON(g1.gid)  g1.z - g2.z as height
FROM veg As g1, ground As g2
WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 15)
ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom);

height
-----------
0.00000
33.80000
0.52000
0.00000
0.00000
0.00000
10.79000
11.55000
6.92000
7.61000
22.54000
50.16000
50.62000
50.96000
45.93000
```

It’d be nice to know which points are which, and where they are at as well, so we’ll grab the gid for each point, as well as the value for x, y, and z:

```SELECT DISTINCT ON(g1.gid)  g1.gid as gid, g2.gid as gid_ground, g1.x as x, g1.y as y, g1.z as z, g1.z - g2.z as height, the_geom g1.the_geom as geometry
FROM veg As g1, ground As g2
WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 15)
ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom);
How efficient is this?
EXPLAIN ANALYZE SELECT DISTINCT ON(g1.gid)  g1.gid As gid, g2.gid As gid_ground, g1.x as x, g1.y as y, g1.z as z, g1.z - g2.z as height, the_geom as geometry
FROM veg As g1, ground As g2
WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 15)
ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom);

Unique  (cost=1347373.60..1347373.61 rows=1 width=225) (actual time=6342.041..6358.930 rows=1882 loops=1)
->  Sort  (cost=1347373.60..1347373.60 rows=1 width=225) (actual time=6342.039..6353.324 rows=42239 loops=1)
Sort Key: g1.gid, (st_distance(g1.the_geom, g2.the_geom))
Sort Method:  external merge  Disk: 1704kB
->  Nested Loop  (cost=44.80..1347373.59 rows=1 width=225) (actual time=0.057..6267.593 rows=42239 loops=1)
Join Filter: ((g1.gid <> g2.gid) AND (g1.the_geom && st_expand(g2.the_geom, 15::double precision)) AND (g2.the_geom && st_expand(g1.the_geom, 15::double precision)) AND _st_dwithin(g1.the_geom, g2.the_geom, 15::double precision))
->  Seq Scan on ground g2  (cost=0.00..57.22 rows=2522 width=113) (actual time=0.011..0.927 rows=2522 loops=1)
->  Materialize  (cost=44.80..63.71 rows=1891 width=112) (actual time=0.000..0.170 rows=1891 loops=2522)
->  Seq Scan on veg g1  (cost=0.00..42.91 rows=1891 width=112) (actual time=0.006..0.634 rows=1891 loops=1)

Total runtime: 6392.724 ms

(10 rows)

Oh, yeah.  We need some indexes.
CREATE INDEX veg_the_geom_idx ON veg USING gist(the_geom);
CREATE INDEX ground_the_geom_idx ON ground USING gist(the_geom);

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=1266.33..1266.33 rows=1 width=225) (actual time=378.057..393.715 rows=1882 loops=1)
->  Sort  (cost=1266.33..1266.33 rows=1 width=225) (actual time=378.055..387.902 rows=42239 loops=1)
Sort Key: g1.gid, (st_distance(g1.the_geom, g2.the_geom))
Sort Method:  external sort  Disk: 1712kB
->  Nested Loop  (cost=0.00..1266.32 rows=1 width=225) (actual time=0.127..302.111 rows=42239 loops=1)
Join Filter: ((g1.gid <> g2.gid) AND (g1.the_geom && st_expand(g2.the_geom, 15::double precision)) AND _st_dwithin(g1.the_geom, g2.the_geom, 15::double precision))
->  Seq Scan on veg g1  (cost=0.00..42.91 rows=1891 width=112) (actual time=0.014..0.490 rows=1891 loops=1)
->  Index Scan using ground_the_geom_idx on ground g2  (cost=0.00..0.37 rows=1 width=113) (actual time=0.025..0.065 rows=29 loops=1891)
Index Cond: (g2.the_geom && st_expand(g1.the_geom, 15::double precision))
Total runtime: 410.327 ms

(10 rows)

Whew!  That’s a bit faster.  Will clustering help us this time?
ALTER TABLE veg CLUSTER ON veg_the_geom_idx;
ALTER TABLE ground CLUSTER ON ground_the_geom_idx;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=1266.33..1266.33 rows=1 width=225) (actual time=377.542..393.175 rows=1882 loops=1)
->  Sort  (cost=1266.33..1266.33 rows=1 width=225) (actual time=377.541..387.374 rows=42239 loops=1)
Sort Key: g1.gid, (st_distance(g1.the_geom, g2.the_geom))
Sort Method:  external sort  Disk: 1712kB
->  Nested Loop  (cost=0.00..1266.32 rows=1 width=225) (actual time=0.065..302.340 rows=42239 loops=1)
Join Filter: ((g1.gid <> g2.gid) AND (g1.the_geom && st_expand(g2.the_geom, 15::double precision)) AND _st_dwithin(g1.the_geom, g2.the_geom, 15::double precision))
->  Seq Scan on veg g1  (cost=0.00..42.91 rows=1891 width=112) (actual time=0.006..0.459 rows=1891 loops=1)
->  Index Scan using ground_the_geom_idx on ground g2  (cost=0.00..0.37 rows=1 width=113) (actual time=0.025..0.065 rows=29 loops=1891)
Index Cond: (g2.the_geom && st_expand(g1.the_geom, 15::double precision))
Total runtime: 409.784 ms
(10 rows)

Eh, not much.  But this is a pretty small dataset.  We’ll see later how it helps on a larger dataset.
And the relavent stats:
These were run on a tortured late model MacBook Pro, with a 2.4GHz Intel Core 2   Duo, with a couple of gigs of RAM on PostgreSQL.
SELECT version();
version
------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.2 on i386-apple-darwin10.2.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1), 64-bit
(1 row)
and PostGIS 1.5.1.
I promised a test in this post to find out whether my supposition that I have a dense enough LiDAR point cloud precludes the need for doing the difference with an interpolated ground point cloud, vs. the raw point cloud as I’m using it here.  I haven’t forgotten… .  That’s the next test, I think.