Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Posts Tagged ‘PostgreSQL’

Taking Slices from LiDAR data: Part IX

Posted by smathermather on February 20, 2017

Part 9 of N… , see e.g. my previous post on the topic.

We’ve been working to reduce the effect of overlapping samples on statistics we run on LiDAR data, and to do so, we’ve been using PDAL’s filters.sample approach. One catch: this handles the horizontal sampling problem well, but we might want to intentionally retain samples from high locations — after all, I want to see the trees for the forest and vice versa. So, it might behoove us to sample within each of our desired height classes to retain as much vertical information as possible.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 1 Comment »

Taking Slices from LiDAR data: Part VIII

Posted by smathermather on February 18, 2017

Part 8 of N… , see e.g. my previous post on the topic.

I didn’t think my explanation of sampling problems with LiDAR data in my previous post was adequate. Here are a couple more figures for clarification.

We can take this dataset over trees, water, fences, and buildings that is heavily sampled in some areas and sparsely sampled in others and use PDAL’s filters.sample (Poisson dart-throwing) to create an evenly sampled version of the dataset.

Figure showing overlap of LiDAR scanlines

Figure showing overlap of LiDAR scanlines

Figure showing data resampled for eveness

Figure showing data resampled for evenness

An extra special thanks to the PDAL team for not only building such cool software, but being so responsive to questions!

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VII

Posted by smathermather on February 15, 2017

Part 7 of N… , see e.g. my previous post on the topic.

More work on taking LiDAR slices. This time, the blog post is all about data preparation. LiDAR data, in its raw form, often has scan line effects when we look at density of points.

lidar_flightlines

This can affect statistics we are running, as our sampling effort is not even. To ameliorate this affect a bit, we can decimate our point cloud before doing further work with it. In PDAL, we have three choices for decimation: filters.decimation, which samples every Nth point from the point cloud; filters.voxelgrid, which does volumetric pixel based resampling; and filters.sample or “Poisson sampling via ‘Dart Throwing'”.

filters.decimation won’t help us with the above problem. Voxelgrid sampling could help, but it’s very regular, so I reject this on beauty grounds alone. This leaves filters.sample.

The nice thing about both the voxelgrid and the poisson sampling is that they retain much of the shape of the point cloud while down sampling the data:

subsample-ex1

subsample-ex2

We will execute the poisson sampling in PDAL. As many things in PDAL are best done with a (json) pipeline file, we construct a pipeline file describing the filtering we want to do, and then call that from the command line:

We can slice our data up similar to previous posts, and then look at the point density per slice. R-code for doing this forthcoming (thanks to Chris Tracey at Western Pennsylvania Conservancy and the LidR project), but below is a graphic as a teaser. For the record, we will probably pursue a fully PDAL solution in the end, but really interesting results in the interim:

image001

More to come. Stay tuned.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 2 Comments »

Using foreign data wrapper to use PostGIS with SQLServer

Posted by smathermather on May 29, 2016

Here was the problem that needed solved last week (we have a few similar problems in upcoming projects, so this was an exciting thing to try): we needed to use PostGIS to access data in a SQLServer database. The SQLServer database backs the web site in question, the underlying content management system, etc., so no– removing SQLServer isn’t really an option at this stage. Obviously PostGIS is a requirement too… .

Before I go further, I used tds_fdw as the foreign data wrapper. The limitations here are as follows: it is a read-only connection, and only works with Sybase and SQLServer, as it uses tabular data stream protocol for communicating between client and server. This is not as generic a solution as we can use. Next time I’ll try ogr_fdw which is more generic as it can connect with other databases and other data types. Another advantage to ogr_fdw is we can use IMPORT FOREIGN SCHEMA. Regina Obe and Leo Hsu warn though to limit this to 150 tables or so for performance reasons.

With the limitations listed above, this is how we built the thang:

DROP SERVER beach_fdw CASCADE;

-- Create the server connection with FOREIGN DATA WRAPPER
CREATE SERVER beach_fdw
FOREIGN DATA WRAPPER tds_fdw
OPTIONS (servername 'name_or_ip', port '1433', database 'non-spatial-database', tds_version '7.1', msg_handler 'notice');

-- We map the postgres user to the user that can read the table we're interested in
CREATE USER MAPPING FOR postgres
SERVER beach_fdw
OPTIONS (username 'user', password 'password');

-- Create the actual foreign table connection
CREATE FOREIGN TABLE beach_closures (
AutoNumber int NOT NULL,
Title varchar NOT NULL,
StartDate timestamp without time zone NOT NULL,
WaterQuality varchar NOT NULL,
Latitude varchar NOT NULL,
Longitude varchar NOT NULL,
BeachStatus varchar NOT NULL,
ClosureColor varchar NOT NULL)
SERVER beach_fdw
OPTIONS (schema_name 'schema_name', table_name 'vw_CMPBeachClosures');

-- Now we create a spatial view using our longitude and latitude
CREATE VIEW v_beach_closures AS
SELECT
AutoNumber, Title, StartDate, WaterQuality, Latitude,
Longitude, BeachStatus, ClosureColor, ST_SetSRID(ST_MakePoint(Longitude::numeric, Latitude::numeric), 4326)	AS geom
FROM beach_closures;

Voila! A nice little PostGIS enabled view of a SQLServer view or table!

Posted in Database, PostGIS, PostgreSQL, SQL | Tagged: , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VI

Posted by smathermather on March 19, 2016

I finally got PDAL properly compiled with Point Cloud Library (PCL) baked in. Word to the wise — CLANG is what the makers are using to compile. The PDAL crew were kind enough to revert the commit which broke GCC support, but why swim upstream? If you are compiling PDAL yourself, use CLANG. (Side note, the revert to support GCC was really helpful for ensuring we could embed PDAL into OpenDroneMap without any compiler changes for that project.)

With a compiled version of PDAL with the PCL dependencies built in, I can bypass using the docker instance. When I was spawning tens of threads of Docker and then killing them, recovery was a problem (it would often hose my docker install completely). I’m sure there’s some bug to report there, or perhaps spawning 40 docker threads is ill advised for some grander reason, but regardless, running PDAL outside a container has many benefits, including simpler code. If you recall our objectives with this script, we want to:

  • Calculate relative height of LiDAR data
  • Slice that data into bands of heights
  • Load the data into a PostgreSQL/PostGIS/pgPointCloud database.

The control script without docker becomes as follows:

#!/bin/bash 

# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
pathname=`dirname $readlinker`
# basename will strip off the directory name and the extension
name=`basename $1 .las`

# PDAL must be built with PCL.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html

pdal translate "$name".las "$name".bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,HeightAboveGround"

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
        do
        # We'll use the height classes to name our output files and tablename.
        # A little cleanup is necessary, so we're removing the colon ":".
        nameend=`echo $START | sed s/:/-/g`

        # Name our output
        bpfname=$name"_"$nameend.bpf

        # Implement the height range filter
        pdal translate $name.bpf $bpfname -f range --filters.range.limits="HeightAboveGround[$START)"

        # Now we put our data in the PostgreSQL database.
        pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='pa_layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
done

We still require our pipeline xml in order to set our default options as follows:

<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
  <Writer type="writers.pgpointcloud">
    <Option name="connection">
      host='localhost' dbname='user' user='user' password=‘password’
    </Option>
    <Option name="table">54001640PAN_heightasz_0-1.5</Option>
    <Option name="compression">dimensional</Option>
    <Filter type="filters.chipper">
      <Option name="capacity">400</Option>
      <Reader type="readers.bpf">
      <Option name="filename">54001640PAN_heightasz_0-1.5.bpf</Option>
      </Reader>
    </Filter>
  </Writer>
</Pipeline>

And as before, we can use parallel to make this run a little lot faster:

find . -name '*.las' | parallel -j20 ./pdal_processor.sh

For the record, I found out through testing that my underlying host only has 20 processors (though more cores). No point in running more processes than that… .

So, when point clouds get loaded, they’re broken up in to “chips” or collections of points. How many chips do we have so far?:

user=# SELECT COUNT(*) FROM "pa_layer_0-1.5";
  count   
----------
 64413535
(1 row)

Now, how many rows is too many in a PostgreSQL database? Answer:

In other words, your typical state full of LiDAR (Pennsylvania or Ohio for example) are not too large to store, retrieve, and analyze. If you’re in California or Texas, or have super dense stuff that’s been flown recently, you will have to provide some structure in the form of partitioning your data into separate tables based on e.g. geography. You could also modify your “chipper” size in the XML file. I have used the default 400 points per patch (for about 25,765,414,000 points total), which is fine for my use case as then I do not exceed 100 million rows once the points are chipped:

      <Option name="capacity">400</Option>

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 3 Comments »

PostgreSQL table drop — a little fun with ouroboros

Posted by smathermather on February 11, 2016

My latests posts on PDAL have been fun. For the moment, a quick bit of code for dropping all the tables in your PostgreSQL database. BTW, the following is a bad idea. Many bad ideas are really useful. This is a really useful but bad idea:

echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" $3"\";"}' | psql

How does this work? Let’s do a quick step through.

First we echo “\dt”.

$ echo "\dt"
\dt

This just prints literally (the literally kind of literally) “\dt”. \dt is a psql command for listing all the tables in your database. We “pipe” (this symbol is a pipe: |) that into psql. Essentially what this does is run this command within psql as follows:

$ echo "\dt" | psql
               List of relations
 Schema |        Name        | Type  |  Owner  
--------+--------------------+-------+---------
 public | layer_0-1.5        | table | user
 public | layer_1.5-3        | table | user
 public | layer_105-150      | table | user
 public | layer_15-30        | table | user
 public | layer_150-200      | table | user
 public | layer_3-6          | table | user
 public | layer_30-45        | table | user
 public | layer_45-60        | table | user
 public | layer_6-15         | table | user
 public | layer_60-105       | table | user
 public | paw_points         | table | user
 public | pointcloud_formats | table | user
 public | spatial_ref_sys    | table | user
(13 rows)

Now that we have a list of tables, we can manipulate that list in order to manipulate our tables. We’ll use the grep command to go line by line and only return the ones with the word “layer” in them. These are the table names that we are interested in dropping.

$ echo "\dt" | psql | grep layer
 public | layer_0-1.5        | table | user
 public | layer_1.5-3        | table | user
 public | layer_105-150      | table | user
 public | layer_15-30        | table | user
 public | layer_150-200      | table | user
 public | layer_3-6          | table | user
 public | layer_30-45        | table | user
 public | layer_45-60        | table | user
 public | layer_6-15         | table | user
 public | layer_60-105       | table | user

Now we use the awk command to just grab the column we want. By default, awk assumes spaces are our column delimiter, so we’ll grab the third column, however we could also use the pipes as our delimiter with the -F flag:

$ echo "\dt" | psql | grep layer | awk '{print $3}'
layer_0-1.5
layer_1.5-3
layer_105-150
layer_15-30
layer_150-200
layer_3-6
layer_30-45
layer_45-60
layer_6-15
layer_60-105

Let us next extend the awk command to print the commands we want to run against our database. Note I had to escape my double quotes with a “\”:

$ echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" $3"\";"}' 
DROP TABLE "layer_0-1.5";
DROP TABLE "layer_1.5-3";
DROP TABLE "layer_105-150";
DROP TABLE "layer_15-30";
DROP TABLE "layer_150-200";
DROP TABLE "layer_3-6";
DROP TABLE "layer_30-45";
DROP TABLE "layer_45-60";
DROP TABLE "layer_6-15";
DROP TABLE "layer_60-105";

Now we have the commands we need to feed back into psql to run against the database. Here is where the dragon eats its own tail:

echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" $3"\";"}' | psql
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE

Posted in Database, PostGIS, PostgreSQL | Tagged: , | 3 Comments »

Taking Slices from LiDAR data: Part V

Posted by smathermather on February 10, 2016

For this post, let’s combine the work in the last 4 posts in order to get a single pipeline for doing the following:

  • Calculate relative height of LiDAR data
  • Slice that data into bands of heights
  • Load the data into a PostgreSQL/PostGIS/pgPointCloud database.
#!/bin/bash 

# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
pathname=`dirname $readlinker`
# basename will strip off the directory name and the extension
name=`basename $1 .las`

# Docker run allows us to leverage a pdal machine with pcl built in,
# thus allowing us to calculate height.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html
docker run -v $pathname:/data pdal/master pdal translate //data/"$name".las //data/"$name"_height.bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,Height";

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
 do
  # We'll use the height classes to name our output files and tablename.
  # A little cleanup is necessary, so we're removing the colon ":".
  nameend=`echo $START | sed s/:/-/g`

  # Name our output
  bpfname=$name"_"$nameend.bpf

  # Implement the height range filter
  pdal translate $name"_height".bpf $bpfname -f range --filters.range.limits="Height[$START)"

  # Now we put our data in the PostgreSQL database.
  pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
done

Now, we can use parallel to make this run a little faster:

find . -name "*.las" | parallel -j6 ./pdal_processor.sh {}&

Sadly, we can run into issues in running this in parallel:

PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
DETAIL:  Key (pcid)=(1) already exists.


PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
DETAIL:  Key (pcid)=(1) already exists.

This issue is a one time issue, however — we just can’t parallelize table creation. Once the tables are created however, I believe we can parallelize without issue. I’ll report if I find otherwise.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Yet another approach to ST_Buffer on geography

Posted by smathermather on January 13, 2016

Another approach to ST_Buffer would be to subdivide the geometries before buffering, and put them all together at the end. ST_SubDivide can do this for us. We can tell it how may vertices we want in each geometry (minimum of 8). Since _ST_BestSRID will try UTM first, we’ll add enough nodes to ensure we always have 8 nodes within the 1,000,000 meter width of a UTM zone by segmentizing at 62,500 meters.

CREATE OR REPLACE FUNCTION ST_Buffer(g1 geography, radius_of_buffer float,
num_seg_quarter_circle integer)
RETURNS geography AS
$BODY$

WITH subdivided AS (
 SELECT
 CASE WHEN ST_GeometryType(g1::geometry) = 'ST_Point' THEN g1
 ELSE (SELECT ST_SubDivide(ST_Segmentize(g1::geometry, 62500), 8) AS geom)
 END
),
transformed_local AS (
 SELECT ST_Transform(g1::geometry, _ST_BestSRID(geom)) AS geom FROM subdivided
),
buffered AS (
 SELECT ST_Buffer(geom, radius_of_buffer, num_seg_quarter_circle) AS geom
 FROM transformed_local
),
transformed_4326 AS (
 SELECT (ST_Dump(ST_Transform(geom, 4326))).geom AS geom FROM buffered
),
checksrid AS (
 SELECT geom FROM transformed_4326
)

SELECT ST_Union(geom)::geography FROM checksrid;
 
$BODY$
LANGUAGE sql VOLATILE;

Aaand, a somewhat unrelated image (just the buffered subdivision of a buffered point, not really an output from our function above. More on all this later.

Screen Shot 2016-01-13 at 6.38.15 PM

Posted in Database, PostGIS, PostgreSQL, SQL | Tagged: , , | 1 Comment »

ST_Buffer on Geography — Iteration

Posted by smathermather on January 12, 2016

In my previous post, Imagico said:

This is nice but still fails in one important aspect – it will create significant inaccuracies once your buffering distance reaches values where buffering in a single local coordinate system is getting problematic which, depending on your precision requirements, will likely happen at distances of a few degrees. A solution to this would require either an iterative split and merge technique of a completely different approach on a very basic level (i.e. attempting truly spherical buffering).

Ok, ok. I kinda forgot about this aspect. The proper way to do this is, as always is to do it at a lower level thank in an SQL function, preferably calculating the buffer on the spheroid, ellipsoid, or geoid. But, ouch. I think we can munge something good enough for now.

We’ll still do the buffer:

WITH transformed AS (
SELECT ST_Transform(g1::geometry, _ST_BestSRID(g1)) AS geom
),
buffered AS (
SELECT ST_Buffer(geom, radius_of_buffer, num_seg_quarter_circle) AS geom
FROM transformed
),
transformed_4326 AS (
SELECT ST_Transform(geom, 4326) AS geom FROM buffered
)
...

But now that we’ve performed the buffer, we need to determine that the buffered geometry is not too large for our local coordinate system. To do that, we’ll compare the best SRID for our buffered geometry to the original best SRID that we calculated based on the original input geometry.

SELECT
CASE WHEN _ST_BestSRID(geom) = _St_BestSRID(g1) THEN geom::geography

If the two SRIDs aren’t equal, then we need to use the new best SRID based on buffered geometry and buffer using that new best SRID.

ELSE ST_Transform(ST_Buffer(ST_Transform(g1::geometry, _ST_BestSRID(geom)), radius_of_buffer, num_seg_quarter_circle), 4326)::geography

Now let’s put it all together:

CREATE OR REPLACE FUNCTION ST_Buffer(g1 geography, radius_of_buffer float,
num_seg_quarter_circle integer)
RETURNS geography AS
$BODY$
 
WITH transformed AS (
SELECT ST_Transform(g1::geometry, _ST_BestSRID(g1)) AS geom
),
buffered AS (
SELECT ST_Buffer(geom, radius_of_buffer, num_seg_quarter_circle) AS geom
FROM transformed
),
transformed_4326 AS (
SELECT ST_Transform(geom, 4326) AS geom FROM buffered
)
SELECT
 CASE WHEN _ST_BestSRID(geom) = _St_BestSRID(g1) THEN geom::geography 
 ELSE ST_Transform(ST_Buffer(ST_Transform(g1::geometry, _ST_BestSRID(geom)), radius_of_buffer, num_seg_quarter_circle), 4326)::geography
 END
 FROM transformed_4326
 
$BODY$
LANGUAGE sql VOLATILE;


CREATE OR REPLACE FUNCTION ST_Buffer(g1 geography, radius_of_buffer float,
buffer_style_parameters text)
RETURNS geography AS
$BODY$
 
WITH transformed AS (
SELECT ST_Transform(g1::geometry, _ST_BestSRID(g1)) AS geom
),
buffered AS (
SELECT ST_Buffer(geom, radius_of_buffer, buffer_style_parameters) AS geom
FROM transformed
),
transformed_4326 AS (
SELECT ST_Transform(geom, 4326) AS geom FROM buffered
)
SELECT
 CASE WHEN _ST_BestSRID(geom) = _St_BestSRID(g1) THEN geom::geography 
 ELSE ST_Transform(ST_Buffer(ST_Transform(g1::geometry, _ST_BestSRID(geom)), radius_of_buffer, buffer_style_parameters), 4326)::geography
 END
 FROM transformed_4326
 
$BODY$
LANGUAGE sql VOLATILE;

So, what’s the affect? Pretty significant really.

SELECT ST_Buffer(
	 ST_GeomFromText(
	  'POINT(0 0)'
	 )::geography, 5000000, 50);
Blue line showing correct buffer, dashed red line showing distorted buffer from using wrong local coordinate system.

Blue line showing correct buffer, dashed red line showing distorted buffer from using wrong local coordinate system.

Posted in Database, PostGIS, PostgreSQL, SQL | Tagged: , , | 2 Comments »

Final?: ST_Buffer on Geography, or My love for @NullIsland

Posted by smathermather on January 11, 2016

Thanks to Paul Norman’s reminder in a previous post, we now have all the pieces we need to complete an ST_Buffer function that exposes all the wonderful goodness of buffer style parameters to geography users and also chooses the best local coordinate system automatically.

We use _ST_BestSRID to choose our local coordinate system.

Remember my previous three disadvantages:

  1. It didn’t do this at a low enough level to automatically use the best available local coordinate system. This leaves it to the user to choose the best available local coordinate system
  2. It used a different function name than you otherwise use for buffering.
  3. Finally, it wouldn’t let you use all the other parameters that ST_Buffer exposes through the use of buffer style parameters.

They are now all solved.


CREATE OR REPLACE FUNCTION ST_Buffer(g1 geography, radius_of_buffer float,
num_seg_quarter_circle integer)
RETURNS geography AS
$BODY$

WITH transformed AS (
SELECT ST_Transform(g1::geometry, _ST_BestSRID(g1)) AS geom
),
buffered AS (
SELECT ST_Buffer(geom, radius_of_buffer, num_seg_quarter_circle) AS geom
FROM transformed
),
transformed_4326 AS (
SELECT ST_Transform(geom, 4326) AS geom FROM buffered
)
SELECT geom::geography FROM transformed_4326

$BODY$
LANGUAGE sql VOLATILE;

 

CREATE OR REPLACE FUNCTION ST_Buffer(g1 geography, radius_of_buffer float,
buffer_style_parameters text)
RETURNS geography AS
$BODY$

WITH transformed AS (
SELECT ST_Transform(g1::geometry, _ST_BestSRID(g1)) AS geom
),
buffered AS (
SELECT ST_Buffer(geom, radius_of_buffer, buffer_style_parameters) AS geom
FROM transformed
),
transformed_4326 AS (
SELECT ST_Transform(geom, 4326) AS geom FROM buffered
)
SELECT geom::geography FROM transformed_4326

$BODY$
LANGUAGE sql VOLATILE;

Let’s try this!

SELECT ST_Buffer(
ST_GeomFromText(
'LINESTRING(5 2.5,0 -2.50,-5 2.5)'
)::geography, 500000, 'join=mitre');
Heart shaped buffer over Null Island.

Heart shaped buffer over Null Island.

Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL.

(poor editing earlier tonight. My apologies.

Posted in Database, PostGIS, PostgreSQL, SQL | Tagged: , , | 2 Comments »