PostGIS Gripe—Limits to Postgre’s B-tree indexing—Followup

In an earlier post, I griped about the limits to Postgres B-tree indexing, in that for large PostGIS geometries, I couldn’t create a unique constraint for PostGIS geometry.  Abe Gillespie (blame me not him for any mistakes  in what follows), suggested that I hash the geometry and create an index on that.  In a follow-up e-mail, he suggested a series of tests in a trigger statement, first test on the bbox, then on the hashed geometry, then finally on the geometry itself upon insert, to determine if the geometry is unique or not.


Well, I’ve completed the first suggestion, but not yet the second.  Here’s my trigger statement:

DROP TRIGGER dissolver on public.reservation_boundaries_public_private_cm;

     UPDATE reservation_boundaries_public_private_cm SET hash = MD5(ST_AsBinary(the_geom));

   END IF;
$dissolver$ LANGUAGE plpgsql;

CREATE TRIGGER dissolver BEFORE AFTER INSERT ON public.reservation_boundaries_public_private_cm FOR EACH STATEMENT EXECUTE PROCEDURE dissolve_bounds();

So, each time I insert a new geometry, the field “hash” is updated with an MD5 hashed version of the geometry.  Now we want to test against this so we add a constraint:

ALTER TABLE public.reservation_boundaries_public_private_cm ADD CONSTRAINT geom_uniq UNIQUE (hash);

And it works a charm. The only problem is that if I were to get a hash collision, I would reject a record and not mean to. I’m not sure the statistical chances of this, but implementing the above flow chart would prevent such a possibility.  If my reading is right, the chances of a hash collision is nearly 1 in 2^128, so maybe I won’t worry this year… .

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.