For a couple of months now I’ve been corresponding with Etienne Racine and Pierre Racine out of Montreal Laval University in Quebec City. They decided to take on the problem of finding the speed and accuracy of a number of different techniques for extracting canopy height from LiDAR data. They have been kind enough to allow me to post the results here. This will be a multipart series. We’ll start with the punchline, and then build out the code behind each of the queries/calculations in turn (category will be “LiDAR Shootout“) with separate posts.
In the hopper for testing were essentially 4 different ways to extract canopy height from a LiDAR point cloud and (in three of the cases) a DEM extracted from the LiDAR point cloud. The 4 techniques were as follows:
- Use ArcGIS ExtractValuesToPoints.
- Use R using the raster and rgdal packages with raster::extract() function with in-memory results.
- Use postgis (2.0 svn) Raster datatype (formerly WKT Raster)
- And use a simple nearest neighbor approach with ST_DWithin within PostGIS.
The data are from the Ohio Statewide Imagery Program (OSIP) program, run by Ohio Geographically Referenced Information Program (OGRIP). Ohio is blessed with an excellent 2006-2008 LiDAR dataset and statewide DEM derived from the (effectively) 3.5 foot horizontal posting data (specs may say 5-ft, I can’t remember…).
Speeeed:
So, first to speed, initial results. Who wins in the speed category? Measured as points per second (on consistently the same hardware), nearest neighbor wins by a landslide (bigger is better here):
Application: Points per Second
ArcGIS: 4504
R: 3228
PostGIS Raster: 1451 (maybe as high as 4500)
Postgis nn: 18208
Now, later on, Pierre discovered changes to indexing may help an the EXPLAIN query analyzer optimization which tripled the PostGIS Raster query speed, making it about the same speed as ArcGIS. More on that later.
Figure removed– to be replaced in another post
Accuracy:
A fast calculation is always better, unless you trade off accuracy in some detrimental way. With PostGIS NN, we’re not interpolating our LiDAR ground point cloud before calculating the difference, so relative to the other three techniques, we could be introducing some bias/artifacts here, a point Jeff Evans makes here. Overall error relative to the interpolated solution introduced by using an NN technique on this dataset: 0.268 feet. Whew! Sigh of relief for me! We may spend some time looking at the geographic distribution of that error to see if it’s random or otherwise, but that’s very near the error level for the input data.
Addendum:
8 days ago, Martin Isenburg’s let the lasheight tool drop. Lastools is impressively fast. Preliminary tests by Etienne: 140450 points per second.
Oy! A factor of 8 faster than PostGIS NN And it uses an on-the-fly calculated DEM using delaunay triangulation. Preliminary results indicate a very different DEM than the one calculated by the state:
RMSE: 25.4227446600656
More research will need done to find out the source of the differences: they are not random.
In other news:
PostGIS will get a true Knn technique in the near future. Such a project is funded, so stay tuned for faster results:
http://www.postgis.org/pipermail/postgis-devel/2011-June/013931.html
Also, index changes to PostGIS could come down the pike, if funded, that will result in bounding box queries that are 6-times faster, ala:
http://blog.opengeo.org/2011/05/27/pgcon-notes-3/
Between the two optimizations, we could give Lastools a run for its money on speed 🙂 . In the meantime, preliminary RMSE aside, Lastools (the 5th tool not in this test) may win hands down.
Starting with the ‘punchline’ and examining the validity of the methodology/comparison later? I guess we should interpret ‘punchline’ literally as a joke.
Do you have a recommended methodology? I’d be happy to throw it in with the overall comparison.
I should state as an addendum to my reply and for the benefit of other readers that based on his e-mail, Clayton works for ESRI, so I’d love to have him weigh in on the choice of the use of ExtractValuesToPoints as a method for finding the height values for a LiDAR point cloud. If there’s a better/faster way, I’d love to test it.
Very cool and very interesting! Have you considered including Robert McGaughey’s FUSION canopy height model (http://www.fs.fed.us/eng/rsac/fusion/)? What about other pay-ware programs like TIFFs (http://globalidar.com/default.aspx) or Matlab scripts?
I have not yet looked into any of these. I used Matlab extensively when I did Radar research, but found it slow relative to most of the C++ code we worked with. That said, we never paid for any of their high-end processing software, just the vanilla stuff, and I doubt our Matlab coding was terribly efficient– it was mostly naive coding just to get the work done.
Definitely add Quick Terrain Modeler, TIFFS, and MARS as all three are 64-bit multicore enabled.
I don’t think that coming up with an alternative to one function in isolation of a workflow or purpose is particularly useful. Getting above ground heights for points is not the ultimate goal is it? That aside, look into:
1: If it’s a normalized height surface you’re after use the Minus tool between first return and bare earth surfaces.
2: If you need height per point use InterpolateShape. As inputs, use a multipoint feature class containing above ground points (made from LASToMultipoint) against a normalized height surface raster. The resulting multipoints will have height above ground assigned to the point geometry.
Regards, Clayton
1: I have not been satisfied with the outputs of first return minus bare earth. Perhaps it’s lack of optimization of the density of raster grid to the input point density, the interpolation/interpolator choices, or perhaps a function of the complexity of canopies and the slight non-nadir angle to most lidar points (i.e. not all first return points are the top of the canopy/surface) but I’ve found the products from this approach to be lacking in cartographic verisimilitude, though the statistical accuracy is undoubtedly just fine. The eye is quite sensitive to non-random error, even if parametric statistics are not. In addition, this approach does not work for point clouds that are missing return number information, as the Ohio OSIP dataset is.
2: Pure point techniques allow for further processing to a surface. As an intermediate product, they retain an intermediate veracity irrespective of what interpolation techniques are used to generate a surface, whether the endpoint is raster or tin. I’ll look into using InterpolateShape in combination with a multipoint cloud. This “function in isolation” may be the sort of answer I was looking for. 🙂 Single point raster extracts (i.e. ExtractValuesToPoints) are naturally slow. This is closely related to the horses vs. dogs argument that Pierre was making in the latest post in this series.
An interesting alternate approach would be to do none of the above, but calculate a minimum bounding polyhedron, or 3D concave hull, and then calculate a TIN of the difference or sample to a raster surface. Of course, that could be esoteric overkill, but might allow for some really nice optimizations which don’t require comparing points to raster at all.
i just tried out FUSION in my quest to determine what a compatible versus incompatible LAS 1.4 would do to existing software. FUSION is one of those packages that can handle the compatible LAS 1.4 that I have been advocating, but will lead to corrupt output / software crashes with the officially proposed LAS 1.4.