Archive for the ‘Database’ Category

A little Gorilla Time

Posted by smathermather on June 12, 2017

I miss my mountain gorilla friends in Rwanda. Let’s write a little more code to support them. I’ll be visiting Karisoke again next week, so it seems timely to post a little more code (HT Jean Pierre Samedi Mucyo for working with me on this one).

The problem today is simple — given a time series of gorilla locations and dates, can we calculate rate of travel using PostgreSQL? Well, of course we can. We can do anything in Postgre.

We have two tricks here:

1. The first is to order our data so we can just compare one row to the next.
2. Once we do that, we need simply to use PostGIS to calculate distance, and ordinary time functions from Postgres to calculate time difference.

This is my first use of WITH RECURSIVE, and it’s probably unnecessary (could be replaced with windowing functions), but I was very proud to finally get over my fear of WITH RECURSIVE . (We actually use one windowing function in our prep of the data. But there we are… ).

For the record, WITH RECURSIVE isn’t recursive, but it is useful here in allowing us to compare the current row with the previous.

Taking Slices from LiDAR data: Part IX

Posted by smathermather on February 20, 2017

Part 9 of N… , see e.g. my previous post on the topic.

We’ve been working to reduce the effect of overlapping samples on statistics we run on LiDAR data, and to do so, we’ve been using PDAL’s filters.sample approach. One catch: this handles the horizontal sampling problem well, but we might want to intentionally retain samples from high locations — after all, I want to see the trees for the forest and vice versa. So, it might behoove us to sample within each of our desired height classes to retain as much vertical information as possible.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 1 Comment »

Taking Slices from LiDAR data: Part VIII

Posted by smathermather on February 18, 2017

Part 8 of N… , see e.g. my previous post on the topic.

I didn’t think my explanation of sampling problems with LiDAR data in my previous post was adequate. Here are a couple more figures for clarification.

We can take this dataset over trees, water, fences, and buildings that is heavily sampled in some areas and sparsely sampled in others and use PDAL’s filters.sample (Poisson dart-throwing) to create an evenly sampled version of the dataset.

Figure showing overlap of LiDAR scanlines

Figure showing data resampled for evenness

An extra special thanks to the PDAL team for not only building such cool software, but being so responsive to questions!

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VII

Posted by smathermather on February 15, 2017

Part 7 of N… , see e.g. my previous post on the topic.

More work on taking LiDAR slices. This time, the blog post is all about data preparation. LiDAR data, in its raw form, often has scan line effects when we look at density of points.

This can affect statistics we are running, as our sampling effort is not even. To ameliorate this affect a bit, we can decimate our point cloud before doing further work with it. In PDAL, we have three choices for decimation: filters.decimation, which samples every Nth point from the point cloud; filters.voxelgrid, which does volumetric pixel based resampling; and filters.sample or “Poisson sampling via ‘Dart Throwing'”.

filters.decimation won’t help us with the above problem. Voxelgrid sampling could help, but it’s very regular, so I reject this on beauty grounds alone. This leaves filters.sample.

The nice thing about both the voxelgrid and the poisson sampling is that they retain much of the shape of the point cloud while down sampling the data:

We will execute the poisson sampling in PDAL. As many things in PDAL are best done with a (json) pipeline file, we construct a pipeline file describing the filtering we want to do, and then call that from the command line:

We can slice our data up similar to previous posts, and then look at the point density per slice. R-code for doing this forthcoming (thanks to Chris Tracey at Western Pennsylvania Conservancy and the LidR project), but below is a graphic as a teaser. For the record, we will probably pursue a fully PDAL solution in the end, but really interesting results in the interim:

More to come. Stay tuned.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 2 Comments »

Using PostGIS for Hydrologic Modeling (reblog)

Posted by smathermather on June 17, 2016

The Problem We have to filter out the roads and ditches without removing streams that cross roads or follow them closely. I’m going to use PostGIS to find the intersection of the streams lines data with a buffered roads polygon. If the intersected line is less than 50% of the length of the stream line, […]

Posted in Database, PostGIS, PostgreSQL, SQL | Tagged: , , , , , | Leave a Comment »

Using foreign data wrapper to use PostGIS with SQLServer

Posted by smathermather on May 29, 2016

Here was the problem that needed solved last week (we have a few similar problems in upcoming projects, so this was an exciting thing to try): we needed to use PostGIS to access data in a SQLServer database. The SQLServer database backs the web site in question, the underlying content management system, etc., so no– removing SQLServer isn’t really an option at this stage. Obviously PostGIS is a requirement too… .

Before I go further, I used tds_fdw as the foreign data wrapper. The limitations here are as follows: it is a read-only connection, and only works with Sybase and SQLServer, as it uses tabular data stream protocol for communicating between client and server. This is not as generic a solution as we can use. Next time I’ll try ogr_fdw which is more generic as it can connect with other databases and other data types. Another advantage to ogr_fdw is we can use IMPORT FOREIGN SCHEMA. Regina Obe and Leo Hsu warn though to limit this to 150 tables or so for performance reasons.

With the limitations listed above, this is how we built the thang:

```DROP SERVER beach_fdw CASCADE;

-- Create the server connection with FOREIGN DATA WRAPPER
CREATE SERVER beach_fdw
FOREIGN DATA WRAPPER tds_fdw
OPTIONS (servername 'name_or_ip', port '1433', database 'non-spatial-database', tds_version '7.1', msg_handler 'notice');

-- We map the postgres user to the user that can read the table we're interested in
CREATE USER MAPPING FOR postgres
SERVER beach_fdw

-- Create the actual foreign table connection
CREATE FOREIGN TABLE beach_closures (
AutoNumber int NOT NULL,
Title varchar NOT NULL,
StartDate timestamp without time zone NOT NULL,
WaterQuality varchar NOT NULL,
Latitude varchar NOT NULL,
Longitude varchar NOT NULL,
BeachStatus varchar NOT NULL,
ClosureColor varchar NOT NULL)
SERVER beach_fdw
OPTIONS (schema_name 'schema_name', table_name 'vw_CMPBeachClosures');

-- Now we create a spatial view using our longitude and latitude
CREATE VIEW v_beach_closures AS
SELECT
AutoNumber, Title, StartDate, WaterQuality, Latitude,
Longitude, BeachStatus, ClosureColor, ST_SetSRID(ST_MakePoint(Longitude::numeric, Latitude::numeric), 4326)	AS geom
FROM beach_closures;

```

Voila! A nice little PostGIS enabled view of a SQLServer view or table!

Posted in Database, PostGIS, PostgreSQL, SQL | Tagged: , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VI

Posted by smathermather on March 19, 2016

I finally got PDAL properly compiled with Point Cloud Library (PCL) baked in. Word to the wise — CLANG is what the makers are using to compile. The PDAL crew were kind enough to revert the commit which broke GCC support, but why swim upstream? If you are compiling PDAL yourself, use CLANG. (Side note, the revert to support GCC was really helpful for ensuring we could embed PDAL into OpenDroneMap without any compiler changes for that project.)

With a compiled version of PDAL with the PCL dependencies built in, I can bypass using the docker instance. When I was spawning tens of threads of Docker and then killing them, recovery was a problem (it would often hose my docker install completely). I’m sure there’s some bug to report there, or perhaps spawning 40 docker threads is ill advised for some grander reason, but regardless, running PDAL outside a container has many benefits, including simpler code. If you recall our objectives with this script, we want to:

• Calculate relative height of LiDAR data
• Slice that data into bands of heights
• Load the data into a PostgreSQL/PostGIS/pgPointCloud database.

The control script without docker becomes as follows:

```#!/bin/bash

# readlink gets us the full path to the file. This is necessary for docker
# returns just the directory name
# basename will strip off the directory name and the extension
name=`basename \$1 .las`

# PDAL must be built with PCL.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html

pdal translate "\$name".las "\$name".bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,HeightAboveGround"

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
do
# We'll use the height classes to name our output files and tablename.
# A little cleanup is necessary, so we're removing the colon ":".
nameend=`echo \$START | sed s/:/-/g`

# Name our output
bpfname=\$name"_"\$nameend.bpf

# Implement the height range filter
pdal translate \$name.bpf \$bpfname -f range --filters.range.limits="HeightAboveGround[\$START)"

# Now we put our data in the PostgreSQL database.
pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='pa_layer_'\$nameend --readers.bpf.filename=\$bpfname --writers.pgpointcloud.overwrite='false'
done
```

We still require our pipeline xml in order to set our default options as follows:

```<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
<Writer type="writers.pgpointcloud">
<Option name="connection">
</Option>
<Option name="table">54001640PAN_heightasz_0-1.5</Option>
<Option name="compression">dimensional</Option>
<Filter type="filters.chipper">
<Option name="capacity">400</Option>
<Option name="filename">54001640PAN_heightasz_0-1.5.bpf</Option>
</Filter>
</Writer>
</Pipeline>
```

And as before, we can use parallel to make this run a little lot faster:

```find . -name '*.las' | parallel -j20 ./pdal_processor.sh
```

For the record, I found out through testing that my underlying host only has 20 processors (though more cores). No point in running more processes than that… .

So, when point clouds get loaded, they’re broken up in to “chips” or collections of points. How many chips do we have so far?:

```user=# SELECT COUNT(*) FROM "pa_layer_0-1.5";
count
----------
64413535
(1 row)
```

Now, how many rows is too many in a PostgreSQL database? Answer:

In other words, your typical state full of LiDAR (Pennsylvania or Ohio for example) are not too large to store, retrieve, and analyze. If you’re in California or Texas, or have super dense stuff that’s been flown recently, you will have to provide some structure in the form of partitioning your data into separate tables based on e.g. geography. You could also modify your “chipper” size in the XML file. I have used the default 400 points per patch (for about 25,765,414,000 points total), which is fine for my use case as then I do not exceed 100 million rows once the points are chipped:

```      <Option name="capacity">400</Option>
```

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 3 Comments »

PostgreSQL table drop — a little fun with ouroboros

Posted by smathermather on February 11, 2016

My latests posts on PDAL have been fun. For the moment, a quick bit of code for dropping all the tables in your PostgreSQL database. BTW, the following is a bad idea. Many bad ideas are really useful. This is a really useful but bad idea:

```echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" \$3"\";"}' | psql
```

How does this work? Let’s do a quick step through.

First we echo “\dt”.

```\$ echo "\dt"
\dt
```

This just prints literally (the literally kind of literally) “\dt”. \dt is a psql command for listing all the tables in your database. We “pipe” (this symbol is a pipe: |) that into psql. Essentially what this does is run this command within psql as follows:

```\$ echo "\dt" | psql
List of relations
Schema |        Name        | Type  |  Owner
--------+--------------------+-------+---------
public | layer_0-1.5        | table | user
public | layer_1.5-3        | table | user
public | layer_105-150      | table | user
public | layer_15-30        | table | user
public | layer_150-200      | table | user
public | layer_3-6          | table | user
public | layer_30-45        | table | user
public | layer_45-60        | table | user
public | layer_6-15         | table | user
public | layer_60-105       | table | user
public | paw_points         | table | user
public | pointcloud_formats | table | user
public | spatial_ref_sys    | table | user
(13 rows)
```

Now that we have a list of tables, we can manipulate that list in order to manipulate our tables. We’ll use the grep command to go line by line and only return the ones with the word “layer” in them. These are the table names that we are interested in dropping.

```\$ echo "\dt" | psql | grep layer
public | layer_0-1.5        | table | user
public | layer_1.5-3        | table | user
public | layer_105-150      | table | user
public | layer_15-30        | table | user
public | layer_150-200      | table | user
public | layer_3-6          | table | user
public | layer_30-45        | table | user
public | layer_45-60        | table | user
public | layer_6-15         | table | user
public | layer_60-105       | table | user
```

Now we use the awk command to just grab the column we want. By default, awk assumes spaces are our column delimiter, so we’ll grab the third column, however we could also use the pipes as our delimiter with the -F flag:

```\$ echo "\dt" | psql | grep layer | awk '{print \$3}'
layer_0-1.5
layer_1.5-3
layer_105-150
layer_15-30
layer_150-200
layer_3-6
layer_30-45
layer_45-60
layer_6-15
layer_60-105
```

Let us next extend the awk command to print the commands we want to run against our database. Note I had to escape my double quotes with a “\”:

```\$ echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" \$3"\";"}'
DROP TABLE "layer_0-1.5";
DROP TABLE "layer_1.5-3";
DROP TABLE "layer_105-150";
DROP TABLE "layer_15-30";
DROP TABLE "layer_150-200";
DROP TABLE "layer_3-6";
DROP TABLE "layer_30-45";
DROP TABLE "layer_45-60";
DROP TABLE "layer_6-15";
DROP TABLE "layer_60-105";
```

Now we have the commands we need to feed back into psql to run against the database. Here is where the dragon eats its own tail:

```echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" \$3"\";"}' | psql
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
```

Posted in Database, PostGIS, PostgreSQL | Tagged: , | 3 Comments »

Taking Slices from LiDAR data: Part V

Posted by smathermather on February 10, 2016

For this post, let’s combine the work in the last 4 posts in order to get a single pipeline for doing the following:

• Calculate relative height of LiDAR data
• Slice that data into bands of heights
• Load the data into a PostgreSQL/PostGIS/pgPointCloud database.
```#!/bin/bash

# readlink gets us the full path to the file. This is necessary for docker
# returns just the directory name
# basename will strip off the directory name and the extension
name=`basename \$1 .las`

# Docker run allows us to leverage a pdal machine with pcl built in,
# thus allowing us to calculate height.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html
docker run -v \$pathname:/data pdal/master pdal translate //data/"\$name".las //data/"\$name"_height.bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,Height";

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
do
# We'll use the height classes to name our output files and tablename.
# A little cleanup is necessary, so we're removing the colon ":".
nameend=`echo \$START | sed s/:/-/g`

# Name our output
bpfname=\$name"_"\$nameend.bpf

# Implement the height range filter
pdal translate \$name"_height".bpf \$bpfname -f range --filters.range.limits="Height[\$START)"

# Now we put our data in the PostgreSQL database.
pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='layer_'\$nameend --readers.bpf.filename=\$bpfname --writers.pgpointcloud.overwrite='false'
done
```

Now, we can use parallel to make this run a little faster:

```find . -name "*.las" | parallel -j6 ./pdal_processor.sh {}&
```

Sadly, we can run into issues in running this in parallel:

```PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"

PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
```

This issue is a one time issue, however — we just can’t parallelize table creation. Once the tables are created however, I believe we can parallelize without issue. I’ll report if I find otherwise.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Point Clouds – the (re)start of a journey

Posted by smathermather on January 19, 2016

If you have followed this blog for a while, you will notice a continual returning to and refinement of ideas and topics. That’s how the blog started, and this role it has served, as a touch stone in my exploration of topics is critical to how I use it. I hope it is useful to you too, as a reader.

So, let’s talk about point clouds again. Point clouds are like ordinary geographic points — they have X, Y, and Z (typically), but there are a whole lot more of them than ordinary. How much more? Well, instead of thinking in the 100s of thousands, we think more along the lines of 100 of millions or billions of points.

The other thing we need to recognize about point clouds is that they may be n-dimensional — a LiDAR point cloud may have X, Y, Z, intensity, return number, class, etc..

Good Free and Open Source Software (FOSS) tools for dealing with point clouds include a spectrum of projects, including PDAL (Point Data Abstraction Library), and the Pointcloud extension for PostgreSQL (with PostGIS integration, of course).

Previously, I was attempting to process large amounts of LiDAR point clouds in order to model bird habitat. Now that I have the requisite storage, and PDAL has a height calculator, I am ready to dive back into this.

In the meantime, there are some cool things to check in this space. Let’s start with this presentation by Michael Smith on PDAL from FOSS4G in Seoul last year.

Want to see some of the incredible things you can do in the browser, go no further than here:

This goes deeper than simple vector tile approaches and employs level of detail optimizations that are necessary for this large of a use case. More on this later

postscript 1/20/2016:

Looks like I outed the above project… . Apologies to https://hobu.co for this. For the record and from their site:

Hobu is a team of three software engineers located in Iowa City, Iowa and Cambridge, Iowa. We have been on the forefront of open source LiDAR software for over five years, and we have been building open source GIS software for even longer. We provide open source lidar software systems development, open source GIS software development in addition to contract research and development, systems design, evaluation and implementation.