Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Archive for the ‘PostGIS’ Category

Taking Slices from LiDAR data: Part IX

Posted by smathermather on February 20, 2017

Part 9 of N… , see e.g. my previous post on the topic.

We’ve been working to reduce the effect of overlapping samples on statistics we run on LiDAR data, and to do so, we’ve been using PDAL’s filters.sample approach. One catch: this handles the horizontal sampling problem well, but we might want to intentionally retain samples from high locations — after all, I want to see the trees for the forest and vice versa. So, it might behoove us to sample within each of our desired height classes to retain as much vertical information as possible.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VIII

Posted by smathermather on February 18, 2017

Part 8 of N… , see e.g. my previous post on the topic.

I didn’t think my explanation of sampling problems with LiDAR data in my previous post was adequate. Here are a couple more figures for clarification.

We can take this dataset over trees, water, fences, and buildings that is heavily sampled in some areas and sparsely sampled in others and use PDAL’s filters.sample (Poisson dart-throwing) to create an evenly sampled version of the dataset.

Figure showing overlap of LiDAR scanlines

Figure showing overlap of LiDAR scanlines

Figure showing data resampled for eveness

Figure showing data resampled for evenness

An extra special thanks to the PDAL team for not only building such cool software, but being so responsive to questions!

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VII

Posted by smathermather on February 15, 2017

Part 7 of N… , see e.g. my previous post on the topic.

More work on taking LiDAR slices. This time, the blog post is all about data preparation. LiDAR data, in its raw form, often has scan line effects when we look at density of points.

lidar_flightlines

This can affect statistics we are running, as our sampling effort is not even. To ameliorate this affect a bit, we can decimate our point cloud before doing further work with it. In PDAL, we have three choices for decimation: filters.decimation, which samples every Nth point from the point cloud; filters.voxelgrid, which does volumetric pixel based resampling; and filters.sample or “Poisson sampling via ‘Dart Throwing'”.

filters.decimation won’t help us with the above problem. Voxelgrid sampling could help, but it’s very regular, so I reject this on beauty grounds alone. This leaves filters.sample.

The nice thing about both the voxelgrid and the poisson sampling is that they retain much of the shape of the point cloud while down sampling the data:

subsample-ex1

subsample-ex2

We will execute the poisson sampling in PDAL. As many things in PDAL are best done with a (json) pipeline file, we construct a pipeline file describing the filtering we want to do, and then call that from the command line:

We can slice our data up similar to previous posts, and then look at the point density per slice. R-code for doing this forthcoming (thanks to Chris Tracey at Western Pennsylvania Conservancy and the LidR project), but below is a graphic as a teaser. For the record, we will probably pursue a fully PDAL solution in the end, but really interesting results in the interim:

image001

More to come. Stay tuned.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 2 Comments »

Using PostGIS for Hydrologic Modeling (reblog)

Posted by smathermather on June 17, 2016

The Problem We have to filter out the roads and ditches without removing streams that cross roads or follow them closely. I’m going to use PostGIS to find the intersection of the streams lines data with a buffered roads polygon. If the intersected line is less than 50% of the length of the stream line, […]

via Filtering Roads from Extracted Streams Data — GeoKota

Posted in Database, PostGIS, PostgreSQL, SQL | Tagged: , , , , , | Leave a Comment »

Using foreign data wrapper to use PostGIS with SQLServer

Posted by smathermather on May 29, 2016

Here was the problem that needed solved last week (we have a few similar problems in upcoming projects, so this was an exciting thing to try): we needed to use PostGIS to access data in a SQLServer database. The SQLServer database backs the web site in question, the underlying content management system, etc., so no– removing SQLServer isn’t really an option at this stage. Obviously PostGIS is a requirement too… .

Before I go further, I used tds_fdw as the foreign data wrapper. The limitations here are as follows: it is a read-only connection, and only works with Sybase and SQLServer, as it uses tabular data stream protocol for communicating between client and server. This is not as generic a solution as we can use. Next time I’ll try ogr_fdw which is more generic as it can connect with other databases and other data types. Another advantage to ogr_fdw is we can use IMPORT FOREIGN SCHEMA. Regina Obe and Leo Hsu warn though to limit this to 150 tables or so for performance reasons.

With the limitations listed above, this is how we built the thang:

DROP SERVER beach_fdw CASCADE;

-- Create the server connection with FOREIGN DATA WRAPPER
CREATE SERVER beach_fdw
FOREIGN DATA WRAPPER tds_fdw
OPTIONS (servername 'name_or_ip', port '1433', database 'non-spatial-database', tds_version '7.1', msg_handler 'notice');

-- We map the postgres user to the user that can read the table we're interested in
CREATE USER MAPPING FOR postgres
SERVER beach_fdw
OPTIONS (username 'user', password 'password');

-- Create the actual foreign table connection
CREATE FOREIGN TABLE beach_closures (
AutoNumber int NOT NULL,
Title varchar NOT NULL,
StartDate timestamp without time zone NOT NULL,
WaterQuality varchar NOT NULL,
Latitude varchar NOT NULL,
Longitude varchar NOT NULL,
BeachStatus varchar NOT NULL,
ClosureColor varchar NOT NULL)
SERVER beach_fdw
OPTIONS (schema_name 'schema_name', table_name 'vw_CMPBeachClosures');

-- Now we create a spatial view using our longitude and latitude
CREATE VIEW v_beach_closures AS
SELECT
AutoNumber, Title, StartDate, WaterQuality, Latitude,
Longitude, BeachStatus, ClosureColor, ST_SetSRID(ST_MakePoint(Longitude::numeric, Latitude::numeric), 4326)	AS geom
FROM beach_closures;

Voila! A nice little PostGIS enabled view of a SQLServer view or table!

Posted in Database, PostGIS, PostgreSQL, SQL | Tagged: , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VI

Posted by smathermather on March 19, 2016

I finally got PDAL properly compiled with Point Cloud Library (PCL) baked in. Word to the wise — CLANG is what the makers are using to compile. The PDAL crew were kind enough to revert the commit which broke GCC support, but why swim upstream? If you are compiling PDAL yourself, use CLANG. (Side note, the revert to support GCC was really helpful for ensuring we could embed PDAL into OpenDroneMap without any compiler changes for that project.)

With a compiled version of PDAL with the PCL dependencies built in, I can bypass using the docker instance. When I was spawning tens of threads of Docker and then killing them, recovery was a problem (it would often hose my docker install completely). I’m sure there’s some bug to report there, or perhaps spawning 40 docker threads is ill advised for some grander reason, but regardless, running PDAL outside a container has many benefits, including simpler code. If you recall our objectives with this script, we want to:

  • Calculate relative height of LiDAR data
  • Slice that data into bands of heights
  • Load the data into a PostgreSQL/PostGIS/pgPointCloud database.

The control script without docker becomes as follows:

#!/bin/bash 

# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
pathname=`dirname $readlinker`
# basename will strip off the directory name and the extension
name=`basename $1 .las`

# PDAL must be built with PCL.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html

pdal translate "$name".las "$name".bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,HeightAboveGround"

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
        do
        # We'll use the height classes to name our output files and tablename.
        # A little cleanup is necessary, so we're removing the colon ":".
        nameend=`echo $START | sed s/:/-/g`

        # Name our output
        bpfname=$name"_"$nameend.bpf

        # Implement the height range filter
        pdal translate $name.bpf $bpfname -f range --filters.range.limits="HeightAboveGround[$START)"

        # Now we put our data in the PostgreSQL database.
        pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='pa_layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
done

We still require our pipeline xml in order to set our default options as follows:

<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
  <Writer type="writers.pgpointcloud">
    <Option name="connection">
      host='localhost' dbname='user' user='user' password=‘password’
    </Option>
    <Option name="table">54001640PAN_heightasz_0-1.5</Option>
    <Option name="compression">dimensional</Option>
    <Filter type="filters.chipper">
      <Option name="capacity">400</Option>
      <Reader type="readers.bpf">
      <Option name="filename">54001640PAN_heightasz_0-1.5.bpf</Option>
      </Reader>
    </Filter>
  </Writer>
</Pipeline>

And as before, we can use parallel to make this run a little lot faster:

find . -name '*.las' | parallel -j20 ./pdal_processor.sh

For the record, I found out through testing that my underlying host only has 20 processors (though more cores). No point in running more processes than that… .

So, when point clouds get loaded, they’re broken up in to “chips” or collections of points. How many chips do we have so far?:

user=# SELECT COUNT(*) FROM "pa_layer_0-1.5";
  count   
----------
 64413535
(1 row)

Now, how many rows is too many in a PostgreSQL database? Answer:

In other words, your typical state full of LiDAR (Pennsylvania or Ohio for example) are not too large to store, retrieve, and analyze. If you’re in California or Texas, or have super dense stuff that’s been flown recently, you will have to provide some structure in the form of partitioning your data into separate tables based on e.g. geography. You could also modify your “chipper” size in the XML file. I have used the default 400 points per patch (for about 25,765,414,000 points total), which is fine for my use case as then I do not exceed 100 million rows once the points are chipped:

      <Option name="capacity">400</Option>

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 3 Comments »

PostgreSQL table drop — a little fun with ouroboros

Posted by smathermather on February 11, 2016

My latests posts on PDAL have been fun. For the moment, a quick bit of code for dropping all the tables in your PostgreSQL database. BTW, the following is a bad idea. Many bad ideas are really useful. This is a really useful but bad idea:

echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" $3"\";"}' | psql

How does this work? Let’s do a quick step through.

First we echo “\dt”.

$ echo "\dt"
\dt

This just prints literally (the literally kind of literally) “\dt”. \dt is a psql command for listing all the tables in your database. We “pipe” (this symbol is a pipe: |) that into psql. Essentially what this does is run this command within psql as follows:

$ echo "\dt" | psql
               List of relations
 Schema |        Name        | Type  |  Owner  
--------+--------------------+-------+---------
 public | layer_0-1.5        | table | user
 public | layer_1.5-3        | table | user
 public | layer_105-150      | table | user
 public | layer_15-30        | table | user
 public | layer_150-200      | table | user
 public | layer_3-6          | table | user
 public | layer_30-45        | table | user
 public | layer_45-60        | table | user
 public | layer_6-15         | table | user
 public | layer_60-105       | table | user
 public | paw_points         | table | user
 public | pointcloud_formats | table | user
 public | spatial_ref_sys    | table | user
(13 rows)

Now that we have a list of tables, we can manipulate that list in order to manipulate our tables. We’ll use the grep command to go line by line and only return the ones with the word “layer” in them. These are the table names that we are interested in dropping.

$ echo "\dt" | psql | grep layer
 public | layer_0-1.5        | table | user
 public | layer_1.5-3        | table | user
 public | layer_105-150      | table | user
 public | layer_15-30        | table | user
 public | layer_150-200      | table | user
 public | layer_3-6          | table | user
 public | layer_30-45        | table | user
 public | layer_45-60        | table | user
 public | layer_6-15         | table | user
 public | layer_60-105       | table | user

Now we use the awk command to just grab the column we want. By default, awk assumes spaces are our column delimiter, so we’ll grab the third column, however we could also use the pipes as our delimiter with the -F flag:

$ echo "\dt" | psql | grep layer | awk '{print $3}'
layer_0-1.5
layer_1.5-3
layer_105-150
layer_15-30
layer_150-200
layer_3-6
layer_30-45
layer_45-60
layer_6-15
layer_60-105

Let us next extend the awk command to print the commands we want to run against our database. Note I had to escape my double quotes with a “\”:

$ echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" $3"\";"}' 
DROP TABLE "layer_0-1.5";
DROP TABLE "layer_1.5-3";
DROP TABLE "layer_105-150";
DROP TABLE "layer_15-30";
DROP TABLE "layer_150-200";
DROP TABLE "layer_3-6";
DROP TABLE "layer_30-45";
DROP TABLE "layer_45-60";
DROP TABLE "layer_6-15";
DROP TABLE "layer_60-105";

Now we have the commands we need to feed back into psql to run against the database. Here is where the dragon eats its own tail:

echo "\dt" | psql | grep layer | awk '{print "DROP TABLE \"" $3"\";"}' | psql
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE
DROP TABLE

Posted in Database, PostGIS, PostgreSQL | Tagged: , | 3 Comments »

Taking Slices from LiDAR data: Part V

Posted by smathermather on February 10, 2016

For this post, let’s combine the work in the last 4 posts in order to get a single pipeline for doing the following:

  • Calculate relative height of LiDAR data
  • Slice that data into bands of heights
  • Load the data into a PostgreSQL/PostGIS/pgPointCloud database.
#!/bin/bash 

# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
pathname=`dirname $readlinker`
# basename will strip off the directory name and the extension
name=`basename $1 .las`

# Docker run allows us to leverage a pdal machine with pcl built in,
# thus allowing us to calculate height.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html
docker run -v $pathname:/data pdal/master pdal translate //data/"$name".las //data/"$name"_height.bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,Height";

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
 do
  # We'll use the height classes to name our output files and tablename.
  # A little cleanup is necessary, so we're removing the colon ":".
  nameend=`echo $START | sed s/:/-/g`

  # Name our output
  bpfname=$name"_"$nameend.bpf

  # Implement the height range filter
  pdal translate $name"_height".bpf $bpfname -f range --filters.range.limits="Height[$START)"

  # Now we put our data in the PostgreSQL database.
  pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
done

Now, we can use parallel to make this run a little faster:

find . -name "*.las" | parallel -j6 ./pdal_processor.sh {}&

Sadly, we can run into issues in running this in parallel:

PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
DETAIL:  Key (pcid)=(1) already exists.


PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
DETAIL:  Key (pcid)=(1) already exists.

This issue is a one time issue, however — we just can’t parallelize table creation. Once the tables are created however, I believe we can parallelize without issue. I’ll report if I find otherwise.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part IV

Posted by smathermather on February 5, 2016

Trans-Alaska Pipeline System Luca Galuzzi 2005

In PDAL, a pipeline file can be used to do a variety of operations. Within the following context, I think of a pipeline file like an ad hoc preferences file, where I can use an external command to iterate through the things I want to change, while holding constant everything else in the pipeline file.

In my use case for this vignette, I’ll use the pipeline file to hold my database preferences for putting the point clouds into my PostGIS database. For the record, I’m using vpicavet’s docker pggis as the starting place for installing PostGIS with the pgpointcloud extension. I have adapted the following pipeline file from the PDAL writers.pgpointcloud example.

<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
  <Writer type="writers.pgpointcloud">
    <Option name="connection">
      host='localhost' dbname=‘lidar’ user='user' password=‘password’
    </Option>
    <Option name="table">54001640PAN_heightasz_0-1.5</Option>
    <Option name="compression">dimensional</Option>
<!--    <Option name="srid">3270</Option> -->
    <Filter type="filters.chipper">
      <Option name="capacity">400</Option>
      <Reader type="readers.las">
          <Option name="filename">54001640PAN_heightasz_0-1.5.las</Option>
<!--          <Option name="spatialreference">EPSG:3270</Option> -->
      </Reader>
    </Filter>
  </Writer>
</Pipeline>

Some things to note. I have commented out the SRID and readers.las.spatialreference in the XML above. We’ll rely on PDAL to discover this, and use the default output of epsg:4326 to store our point clouds for the time being.

Our wrapper script for the pipeline file is very simple. We will use the wrapper script to specify the table and the input file name.

#!/bin/bash

TABLE=`basename $1 .bpf`
INPUT=$1

#echo $TABLE
pdal pipeline -i pipeline.xml --writers.pgpointcloud.table=$TABLE --readers.bpf.filename=$INPUT

Now to use, we’ll use GNU Parallel, as much for it’s XArgs like functionality as scalability:

ls *.bpf | parallel -j6 ./pipeliner.sh {}

Now we can see what tables got loaded:

psql -d lidar
psql (9.5.0)
Type "help" for help.

lidar=# \dt
                    List of relations
 Schema |             Name              | Type  |  Owner  
--------+-------------------------------+-------+---------
 public | 54001640PAN_heightasz_0-1.5   | table | user
 public | 54001640PAN_heightasz_1.5-6   | table | user
 public | 54001640PAN_heightasz_100-200 | table | user
 public | 54001640PAN_heightasz_30-45   | table | user
 public | 54001640PAN_heightasz_45-55   | table | user
 public | 54001640PAN_heightasz_55-65   | table | user
 public | 54001640PAN_heightasz_6-30    | table | user
 public | 54001640PAN_heightasz_65-75   | table | user
 public | 54001640PAN_heightasz_75-85   | table | user
 public | 54001640PAN_heightasz_85-100  | table | user
 public | pointcloud_formats            | table | user
 public | spatial_ref_sys               | table | user
(12 rows)

W00t! We’ve got point clouds in our database! Next, we will visualize the data, and extract some analyses from it.

Posted in 3D, LiDAR, Other, PDAL, pointcloud | Tagged: , , | Leave a Comment »

Taking Slices from LiDAR data: Part III

Posted by smathermather on February 5, 2016

forest_structure
Borrowed from: http://irwantoshut.com

Continuing my series on slicing LiDAR data in order to analyze a forest, one of the objectives of the current project is to understand the habitats that particular species of birds prefer. This will be accomplished using field info from breeding bird surveys combined with LiDAR data of forest structure to help predict what habitats are necessary for particular species of breeding birds.

There are a number of studies doing just this for a variety of regions which I will detail later, but suffice it to say, structure of vegetation matters a lot to birds, so using LiDAR to map out structure can be an important predictive tool for mapping bird habitat. Being able to do that at scale across entire ecoregions—well, that’s just an exciting prospect.
Let’s get to some coding. I would like to take a few slices through our forest based on functional height groups:

Forest Canopy >15 meters
Sub-Canopy 10-15 meters
Tall Shrub 2-10 meters
Short Shrub 0.5-2 meters

(For the record, we don’t have to get these perfectly right. Sub-canopy, for example could be taller or shorter depending on how tall a forest it is in. This is just a convenience for dividing and reviewing the data for the time being. Also, we’ll split a little finer than the above numbers just for giggles.)

We’ll start by just echoing our pdal translate to make sure we like what we are getting for output:

for START in 0:0.5 0.5:1 1:2 2:5 5:10 10:15 15:20 20:35 35:50 50:75 75:100 100:200
 do
  nameend=`echo $START | sed s/:/-/g`
  echo pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_$nameend.bpf -f range --filters.range.limits="Height[$START)"
done

Thus we get the following output:

pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_0-0.5.bpf -f range --filters.range.limits=Height[0:0.5)
pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_0.5-1.bpf -f range --filters.range.limits=Height[0.5:1)
pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_1-2.bpf -f range --filters.range.limits=Height[1:2)
pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_2-5.bpf -f range --filters.range.limits=Height[2:5)
... etc.

Let’s remove our echo statement so this actually runs:

for START in 0:0.5 0.5:1 1:2 2:5 5:10 10:15 15:20 20:35 35:50 50:75 75:100 100:200
 do
  nameend=`echo $START | sed s/:/-/g`
  pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_$nameend.bpf -f range --filters.range.limits="Height[$START)"
done

We should generalize this a bit too. Let’s make this a script to which we can pass our filenames and ranges:

#!/bin/bash
namer=`basename $1 .bpf`
for START in $2
 do
  nameend=`echo $START | sed s/:/-/g`
  pdal translate $namer.bpf $namer"_"$nameend".bpf" -f range --filters.range.limits="Height[$START)"
done

Which to run, we use a statement as follows:

./tree_slicer.sh 54001640PAN_heightasz.bpf "0:0.5 0.5:1 1:2 2:5 5:10 10:15 15:20 20:35 35:50 50:75 75:100 100:200"

I forgot all my data is in feet, but my height classes in meters, so we’ll just multiply our input values by a factor of 3 to get in the same ballpark (future refinement likely):

./tree_slicer.sh 54001640PAN_heightasz.bpf "0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200"

We could alternatively stick to our original categories (short shrub, tall shrub, sub-canopy, canopy) as our break points:

./tree_slicer.sh 54001640PAN_heightasz.bpf "0:1.5 1.5:6 6:30 30:45 45:200"

Finally, we can convert to laz, and load all our slices of point clouds in plas.io an animate between the slices

for OUTPUT in $(ls *.bpf); do docker run -v /home/gisuser/test/test:/data pdal/master pdal translate //data/$OUTPUT //data/$OUTPUT.laz; done

If you look closely, you should be able to see where a tornado in the 80s knocked down much of the forest here. It’s signature is in the tall shrub / sub-canopy layer:

Posted in 3D, LiDAR, Other, PDAL, pointcloud | Tagged: , , | 2 Comments »