Rethinking PostGIS Analyses– Remembering to CLUSTER

Something’s been nagging at me as far as the level of optimization (or lack there-of) that I did to my database before my other posts of using PostGIS to analyze LiDAR data (e.g. this post). It seemed my results were remarkably slow, but I couldn’t put my finger on why that was problematic.

Then, as I was testing a smaller lidar dataset in order to do a shading analysis for my future garden, I noticed that with those 1800 points (a high density point cloud considering I live on a quarter-acre), the nearest-neighbor algorithm, per point, was much faster than when applied to the larger dataset I was testing before. On one hand this seems reasonable. A bigger dataset takes longer to process. But, properly optimized, a large dataset should not take significantly longer per record. We want linear, or near linear, scaling of processing. When we don’t, in this case, it indicates that the data are not structured in a way that allows for fast access to records of interest.

Re-ordering records (PostgreSQL CLUSTER) to reflect the hierarchy of an index is a good way to optimize a database, assuming you know which index is your most important one to use to organize your data structure. In the case of large spatial datasets, this is usually the spatial index.

In this example, I will apply CLUSTERing to the PostGIS database, clustering on the spatial index in order to optimize spatial queries.

ALTER TABLE lidar_ground CLUSTER ON lidar_ground_the_geom_idx;
ALTER TABLE lidar_veg CLUSTER ON lidar_veg_the_geom_idx;

I will report back with some results. Hopefully this helps.

Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Rethinking PostGIS Analyses– Remembering to CLUSTER

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply