I loaded a set of contours into the database that are really nice, up-to-date LiDAR and breakline derived contours. The are engineering grade aerial contours, but they are a very big and complex dataset. So I’ve done what I had hoped was the hard part, using shp2pgsql, I’ve converted them to PostGIS insert statements and dumped them into the database only to find that some sections are corrupt. I’m hoping to use the function ST_Difference to clip out the bad areas, so I can insert them in again.
There are two problems in the contour dataset– some areas are duplicated records, and some areas are missing or otherwise corrupted areas. I’ve got a polygon file (in purple below) that delineates where the problem areas are, so I hope to use the function ST_Difference to remove these areas so I can repopulate with good data.
CREATE TABLE public.cuy_contours_2_clip AS SELECT elevation, ST_Difference(a.the_geom, b.the_geom) AS the_geom FROM base.cuy_contours_2_1 as a, public.contour_clip_bounds as b;
Let’s see if PostGIS can handle this big of a dataset… .
2 thoughts on “Clipping Data”
Could you post some precisions about your data so as to have an idea of PostGIS’s behaviour wrt volume and type of data ?
– data types (points ? regularly gridded ?)
– how many rows ?
– what volume in MB/GB of initial data ? of final tables ?
– time to load each shapefile
– caracteristics of your server ?
PostGIS is not really optimized to handle LIDAR data, and this is a hopefully forthcoming feature, whenever someone sponsors it. Having detailed test cases is always interesting.
The ST_Difference was successful. I’m adding a spatial index to the resultant table right now so that I can view it in uDig and see if I got the results I was expecting.
In quick response, some known limitations to our PostGIS implementation include that it sits on a RAID 5 system. When I have the budget to fix this, I’ll move just the Postgre database over to a small RAID 10. The total size of the database is 250GB, although I know that I have some geometry zombies as a result of not dropping tables correctly. I’ll need to run Probe_Geometry_Columns() to find the zombies before we go into production mode, so I’m guessing the real size of the whole database may be half the current size.
The system runs on a Windows Server 2003 Standard installation– i.e. 32-bit. Available RAM for the application space is 3GB (I’ve got the 3GB switch thrown in the startup configuration), but much of that RAM is hogged by some GeoServer instances running on the same server.
In this case, I was working with a contour dataset, so these are linear contour features derived from a LiDAR point cloud and breaklines. The contour dataset as shapefiles is about 25GB. I don’t know the table size as a PostGIS table, or how to check it. It is a 3D entity, so we have a z for each vertex. It might make sense to convert it to a 2D object, as we have z in the attribute field, but converting it to 2D as shapefiles didn’t seem possible with the tools I have at my disposal.
I don’t know the exact load time of the shapefiles, but for both the lidar points and the contours, it completes somewhere between when I leave it at 5PM and when I come in at 8AM. The shapefiles, 143 of them, range in size from 800MB down to 30MB.
When I’m working with a LiDAR point cloud, I’d like to work with about 990 million points, and the same number of rows. I’ve tried no multipoint optimizations yet. The points are irregularly spaced as they are raw lidar hits, although I can use e.g. lastools to grid them before inserting them into the database, if that would allow for some optimization.
As I run things with LiDAR points/contours in the future, what are additional useful stats to have? Would running ANALYZE be helpful?