View of the sea crashing on the Kerala shore

Keralite Klusters

One of the problems we worked on in Kerala wasn’t a drone problem at all, but an infrastructure problem. Given a distribution of locales, how can we cluster those according to distance to help reduced duplicated infrastructure services (internet, electricity, etc.)? There are lots of ways to solve this, but we chose to do it in PostGIS because… Ok: because that’s what I always often do whether I should or not.

I worked on this with Deepthi Patric, the Geomatics expert for the group. For the record, the points below are not the points we actually analyzed, but a randomized distribution within occupied areas of Kerala.

Points across Kerala for clustering
Points across Kerala for clustering
View over Trivandrum, Kerala
View over Trivandrum, Kerala

In this case, PostGIS was a good choice of tool, at least once we started using it intelligently. At first we tried something less intelligent and more blunt wherein we buffered and created convex hulls with left joins, etc.:



	buffer AS (
		SELECT ST_Buffer(geom, 750) AS geom FROM vibhav
	unionn AS (
		SELECT ST_Union(geom) AS geom FROM buffer
	dump AS (
		SELECT (ST_Dump(geom)).geom FROM unionn
	hull AS (
		SELECT ST_ConcaveHull(geom, 1) AS geom FROM dump
	unionagain AS (
		SELECT ST_Union(geom) AS geom FROM hull
	dumpagain AS (
		SELECT (ST_Dump(geom)).geom FROM unionagain
	rownum AS (
		SELECT row_number() over() AS clusterid, geom FROM dumpagain
	leftjoin AS (
		SELECT v.gid, v.geom, user_name, name, address, longitude, latitude, clusterid
			FROM vibhav v LEFT JOIN rownum rn ON ST_Intersects(v.geom, rn.geom)

	SELECT * FROM leftjoin;

This was…  well. Not the best method. So, back to the drawing board. Let’s use something a little smarter. Since PostGIS 2.2, we’ve had ST_ClusterWithin. This promisingly named function is just what we need… . It does return a geometry collection, so we need to manipulate the results a bit, but the query is pretty reasonable in the end:

-- Drop our table just in case so we can rerun
DROP TABLE IF EXISTS clusterwithin;

-- Create table
CREATE TABLE clusterwithin AS (

-- We'll use the endlessly flexible CTE "WITH" to make our query readable
-- This has the effect to creating a sequence of temporary tables
WITH unnestt AS (
-- First we cluster out points within 1000 meters
	SELECT unnest(ST_ClusterWithin(geom, 1000)) geomcollect
		FROM (SELECT * FROM vibhav_rand) vibhav
-- Then we dump out the clusters and assign them cluster ids in the process
dumpnest AS (
	SELECT row_number() over() AS clusterid, (ST_Dump(geomcollect)).geom AS geom FROM unnestt
-- It will be useful to know how many points are in each cluster later,
-- so let's calculate that now
countcluster AS (
	SELECT clusterid, count(clusterid) FROM dumpnest
		GROUP BY clusterid
-- We end with a left join to bring the count together with our clusters
SELECT dn.clusterid, count, geom FROM
	dumpnest dn LEFT JOIN countcluster cc ON dn.clusterid = cc.clusterid 


-- As it might be useful to draw an outline around each of our groups,
-- we do this with a convex hull:
DROP TABLE IF EXISTS clusterwithinn;

CREATE TABLE clusterwithinn AS (
SELECT clusterid, ST_ConvexHull(ST_Union(geom)) AS geom FROM
	clusterwithin WHERE count > 2
	GROUP BY clusterid

Thus changing individual points:

Inset of unclustered points.
Inset of unclustered points.

to smart clusters based on proximity:

Inset of points clustered into points within 1km of each other

Not too bad, and pretty quick to run as well.

Woman pointing at man on mountain top with another man looking on

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.