A unique constraint implemented as a trigger checking hashed geometry seems like a good idea, that is until I applied it to a multi-10GB dataset. Not surprisingly, it starts off fast on inserts, and slows down a lot as time goes on. So, I thought I’d approach duplicates another way, by deleting them once they exist. So for my table:
base.cuy_contours_2
I have hashed my geometry, and added an index:
UPDATE base.cuy_contours_2 SET hash = MD5(ST_AsBinary(the_geom)); CREATE INDEX cuy_contours_2_hash_idx ON base.cuy_contours_2(hash);
I will follow by deleting the duplicate geometries by checking the hashed geometry:
DELETE FROM base.cuy_contours_2 WHERE gid NOT IN (SELECT MAX(dup.gid) FROM base.cuy_contours_2 as dup GROUP BY dup.hash );
But, after a very long time, I get an error:
ERROR: could not read block 356366 of relation 1663/17185/12118368: Permission denied
So, it’s a down week, and most of my maps are cached, so we’ll take the postgres service down, and run it in single user mode:
postgres --single -D c:\postgre_data CM
And rerun the command on one line, per single user mode, with no semicolon:
DELETE FROM base.cuy_contours_2 WHERE gid NOT IN (SELECT MAX(dup.gid) FROM base.cuy_contours_2 as dup GROUP BY dup.hash );and hope for the best… .
Oh and thanks to :
For help with this one.
This didn’t work and I don’t know why… . Instead I’ll try making a copy of the table with the following language construct and hope for the best… .
SELECT *
FROM base.cuy_contours_2
ORDER BY hash ASC
LIMIT 1;