An actual example of using the hashed geometry to uniquely constrain geometry (nothing really new here, just an example of using what we learned in the last post with a new problem):
I’ve got a contours dataset from the county where I work that is in 20 different shapefiles ranging in size from 500MB to 2GB. I want to put them into a PostGIS database in a single table, so that I can use PostGIS to slice and dice them into reasonable chunks for use by our engineers in AutoDesk products, by ArcGIS users in several divisions, and to improve spatial indexing when dump them into one table in our spatial database as well (ala Regina Obe/Boston GIS’s Map Dicing). This is a really detailed LiDAR-based contour dataset which used breaklines to make a really detailed and accurate product– which sadly is unusable in its current form because it takes too long to load any given section.
But, today, no slicing and dicing just yet. I loaded all the data and discovered duplicate geometries. It seems that some of these blocks that the data come in have overlaps. I don’t have a screen shot of it, but when I viewed the data in uDig, you can see the duplicate geometries because the transparency of the contour lines in the duplicated areas is less transparent, i.e. the lines are darker against a white background. I could search for and remove duplicates, but that seems heavy handed. I’m going to recreate the table with a unique constraint applied to the hashed geometry (and hope for no hash collisions).
First we create the table:
CREATE TABLE base.cuy_contours_2 ( gid serial NOT NULL, elevation smallint, shape_leng real, the_geom geometry, hash character(32), CONSTRAINT cuy_contours_2_pkey PRIMARY KEY (gid), CONSTRAINT geom_uniq UNIQUE (hash), CONSTRAINT enforce_dims_the_geom CHECK (ndims(the_geom) = 4), CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'LINESTRING'::text OR the_geom IS NULL), CONSTRAINT enforce_srid_the_geom CHECK (srid(the_geom) = 9102722) ) WITH (OIDS=FALSE); ALTER TABLE base.cuy_contours_2 OWNER TO postgres; CREATE INDEX cuy_contours_2_the_geom_idx ON base.cuy_contours_2 USING gist (the_geom);
Now we create our trigger that creates a hashed version of the geometry, “the_geom”, in a column called “hash”.
DROP TRIGGER hasher on base.cuy_contours_2; CREATE OR REPLACE FUNCTION hash_geom() RETURNS TRIGGER AS $hasher$ BEGIN IF(TG_OP='INSERT') THEN UPDATE base.cuy_contours_2 SET hash = MD5(ST_AsBinary(the_geom)); END IF; RETURN NEW; END; $hasher$ LANGUAGE plpgsql; CREATE TRIGGER hasher AFTER INSERT ON base.cuy_contours_2 FOR EACH STATEMENT EXECUTE PROCEDURE hash_geom();
And finally we add a unique constraint:
ALTER TABLE base.cuy_contours_2 ADD CONSTRAINT geom_uniq UNIQUE (hash);
Now, sadly, I have to use windows where I work, so the following is Windows Command language to load all my shapefiles into the PostGIS database. That said, I think the command, in this rare case, is more elegant than similar code in, say, BASH scripting:
for %f in (*.shp) do shp2pgsql -s 102722 -a -S -N skip %f base.cuy_contours_2 | psql -h bobs_server -d CM -U postgres
using the shp2pgsql -a flag for append, -S to enforce simple geometries, -N skip to ensure that we keep all records with non-null geometries to compensate for potential errors in the input dataset).