In an earlier post, I griped about the limits to Postgres B-tree indexing, in that for large PostGIS geometries, I couldn’t create a unique constraint for PostGIS geometry. Abe Gillespie (blame me not him for any mistakes in what follows), suggested that I hash the geometry and create an index on that. In a follow-up e-mail, he suggested a series of tests in a trigger statement, first test on the bbox, then on the hashed geometry, then finally on the geometry itself upon insert, to determine if the geometry is unique or not.
Well, I’ve completed the first suggestion, but not yet the second. Here’s my trigger statement:
DROP TRIGGER dissolver on public.reservation_boundaries_public_private_cm;
CREATE OR REPLACE FUNCTION dissolve_bounds() RETURNS TRIGGER AS $dissolver$
BEGIN
IF(TG_OP='INSERT') THEN
UPDATE reservation_boundaries_public_private_cm SET hash = MD5(ST_AsBinary(the_geom));
END IF;
RETURN NEW;
END;
$dissolver$ LANGUAGE plpgsql;
CREATE TRIGGER dissolver BEFORE AFTER INSERT ON public.reservation_boundaries_public_private_cm FOR EACH STATEMENT EXECUTE PROCEDURE dissolve_bounds();
So, each time I insert a new geometry, the field “hash” is updated with an MD5 hashed version of the geometry. Now we want to test against this so we add a constraint:
ALTER TABLE public.reservation_boundaries_public_private_cm ADD CONSTRAINT geom_uniq UNIQUE (hash);
And it works a charm. The only problem is that if I were to get a hash collision, I would reject a record and not mean to. I’m not sure the statistical chances of this, but implementing the above flow chart would prevent such a possibility. If my reading is right, the chances of a hash collision is nearly 1 in 2^128, so maybe I won’t worry this year… .