Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Archive for the ‘pointcloud’ Category

Taking Slices from LiDAR data: Part IX

Posted by smathermather on February 20, 2017

Part 9 of N… , see e.g. my previous post on the topic.

We’ve been working to reduce the effect of overlapping samples on statistics we run on LiDAR data, and to do so, we’ve been using PDAL’s filters.sample approach. One catch: this handles the horizontal sampling problem well, but we might want to intentionally retain samples from high locations — after all, I want to see the trees for the forest and vice versa. So, it might behoove us to sample within each of our desired height classes to retain as much vertical information as possible.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VIII

Posted by smathermather on February 18, 2017

Part 8 of N… , see e.g. my previous post on the topic.

I didn’t think my explanation of sampling problems with LiDAR data in my previous post was adequate. Here are a couple more figures for clarification.

We can take this dataset over trees, water, fences, and buildings that is heavily sampled in some areas and sparsely sampled in others and use PDAL’s filters.sample (Poisson dart-throwing) to create an evenly sampled version of the dataset.

Figure showing overlap of LiDAR scanlines

Figure showing overlap of LiDAR scanlines

Figure showing data resampled for eveness

Figure showing data resampled for evenness

An extra special thanks to the PDAL team for not only building such cool software, but being so responsive to questions!

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part VII

Posted by smathermather on February 15, 2017

Part 7 of N… , see e.g. my previous post on the topic.

More work on taking LiDAR slices. This time, the blog post is all about data preparation. LiDAR data, in its raw form, often has scan line effects when we look at density of points.

lidar_flightlines

This can affect statistics we are running, as our sampling effort is not even. To ameliorate this affect a bit, we can decimate our point cloud before doing further work with it. In PDAL, we have three choices for decimation: filters.decimation, which samples every Nth point from the point cloud; filters.voxelgrid, which does volumetric pixel based resampling; and filters.sample or “Poisson sampling via ‘Dart Throwing'”.

filters.decimation won’t help us with the above problem. Voxelgrid sampling could help, but it’s very regular, so I reject this on beauty grounds alone. This leaves filters.sample.

The nice thing about both the voxelgrid and the poisson sampling is that they retain much of the shape of the point cloud while down sampling the data:

subsample-ex1

subsample-ex2

We will execute the poisson sampling in PDAL. As many things in PDAL are best done with a (json) pipeline file, we construct a pipeline file describing the filtering we want to do, and then call that from the command line:

We can slice our data up similar to previous posts, and then look at the point density per slice. R-code for doing this forthcoming (thanks to Chris Tracey at Western Pennsylvania Conservancy and the LidR project), but below is a graphic as a teaser. For the record, we will probably pursue a fully PDAL solution in the end, but really interesting results in the interim:

image001

More to come. Stay tuned.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 2 Comments »

Taking Slices from LiDAR data: Part VI

Posted by smathermather on March 19, 2016

I finally got PDAL properly compiled with Point Cloud Library (PCL) baked in. Word to the wise — CLANG is what the makers are using to compile. The PDAL crew were kind enough to revert the commit which broke GCC support, but why swim upstream? If you are compiling PDAL yourself, use CLANG. (Side note, the revert to support GCC was really helpful for ensuring we could embed PDAL into OpenDroneMap without any compiler changes for that project.)

With a compiled version of PDAL with the PCL dependencies built in, I can bypass using the docker instance. When I was spawning tens of threads of Docker and then killing them, recovery was a problem (it would often hose my docker install completely). I’m sure there’s some bug to report there, or perhaps spawning 40 docker threads is ill advised for some grander reason, but regardless, running PDAL outside a container has many benefits, including simpler code. If you recall our objectives with this script, we want to:

  • Calculate relative height of LiDAR data
  • Slice that data into bands of heights
  • Load the data into a PostgreSQL/PostGIS/pgPointCloud database.

The control script without docker becomes as follows:

#!/bin/bash 

# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
pathname=`dirname $readlinker`
# basename will strip off the directory name and the extension
name=`basename $1 .las`

# PDAL must be built with PCL.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html

pdal translate "$name".las "$name".bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,HeightAboveGround"

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
        do
        # We'll use the height classes to name our output files and tablename.
        # A little cleanup is necessary, so we're removing the colon ":".
        nameend=`echo $START | sed s/:/-/g`

        # Name our output
        bpfname=$name"_"$nameend.bpf

        # Implement the height range filter
        pdal translate $name.bpf $bpfname -f range --filters.range.limits="HeightAboveGround[$START)"

        # Now we put our data in the PostgreSQL database.
        pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='pa_layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
done

We still require our pipeline xml in order to set our default options as follows:

<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
  <Writer type="writers.pgpointcloud">
    <Option name="connection">
      host='localhost' dbname='user' user='user' password=‘password’
    </Option>
    <Option name="table">54001640PAN_heightasz_0-1.5</Option>
    <Option name="compression">dimensional</Option>
    <Filter type="filters.chipper">
      <Option name="capacity">400</Option>
      <Reader type="readers.bpf">
      <Option name="filename">54001640PAN_heightasz_0-1.5.bpf</Option>
      </Reader>
    </Filter>
  </Writer>
</Pipeline>

And as before, we can use parallel to make this run a little lot faster:

find . -name '*.las' | parallel -j20 ./pdal_processor.sh

For the record, I found out through testing that my underlying host only has 20 processors (though more cores). No point in running more processes than that… .

So, when point clouds get loaded, they’re broken up in to “chips” or collections of points. How many chips do we have so far?:

user=# SELECT COUNT(*) FROM "pa_layer_0-1.5";
  count   
----------
 64413535
(1 row)

Now, how many rows is too many in a PostgreSQL database? Answer:

In other words, your typical state full of LiDAR (Pennsylvania or Ohio for example) are not too large to store, retrieve, and analyze. If you’re in California or Texas, or have super dense stuff that’s been flown recently, you will have to provide some structure in the form of partitioning your data into separate tables based on e.g. geography. You could also modify your “chipper” size in the XML file. I have used the default 400 points per patch (for about 25,765,414,000 points total), which is fine for my use case as then I do not exceed 100 million rows once the points are chipped:

      <Option name="capacity">400</Option>

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 3 Comments »

Taking Slices from LiDAR data: Part V

Posted by smathermather on February 10, 2016

For this post, let’s combine the work in the last 4 posts in order to get a single pipeline for doing the following:

  • Calculate relative height of LiDAR data
  • Slice that data into bands of heights
  • Load the data into a PostgreSQL/PostGIS/pgPointCloud database.
#!/bin/bash 

# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
pathname=`dirname $readlinker`
# basename will strip off the directory name and the extension
name=`basename $1 .las`

# Docker run allows us to leverage a pdal machine with pcl built in,
# thus allowing us to calculate height.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html
docker run -v $pathname:/data pdal/master pdal translate //data/"$name".las //data/"$name"_height.bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,Height";

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
 do
  # We'll use the height classes to name our output files and tablename.
  # A little cleanup is necessary, so we're removing the colon ":".
  nameend=`echo $START | sed s/:/-/g`

  # Name our output
  bpfname=$name"_"$nameend.bpf

  # Implement the height range filter
  pdal translate $name"_height".bpf $bpfname -f range --filters.range.limits="Height[$START)"

  # Now we put our data in the PostgreSQL database.
  pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
done

Now, we can use parallel to make this run a little faster:

find . -name "*.las" | parallel -j6 ./pdal_processor.sh {}&

Sadly, we can run into issues in running this in parallel:

PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
DETAIL:  Key (pcid)=(1) already exists.


PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
DETAIL:  Key (pcid)=(1) already exists.

This issue is a one time issue, however — we just can’t parallelize table creation. Once the tables are created however, I believe we can parallelize without issue. I’ll report if I find otherwise.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part IV

Posted by smathermather on February 5, 2016

Trans-Alaska Pipeline System Luca Galuzzi 2005

In PDAL, a pipeline file can be used to do a variety of operations. Within the following context, I think of a pipeline file like an ad hoc preferences file, where I can use an external command to iterate through the things I want to change, while holding constant everything else in the pipeline file.

In my use case for this vignette, I’ll use the pipeline file to hold my database preferences for putting the point clouds into my PostGIS database. For the record, I’m using vpicavet’s docker pggis as the starting place for installing PostGIS with the pgpointcloud extension. I have adapted the following pipeline file from the PDAL writers.pgpointcloud example.

<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
  <Writer type="writers.pgpointcloud">
    <Option name="connection">
      host='localhost' dbname=‘lidar’ user='user' password=‘password’
    </Option>
    <Option name="table">54001640PAN_heightasz_0-1.5</Option>
    <Option name="compression">dimensional</Option>
<!--    <Option name="srid">3270</Option> -->
    <Filter type="filters.chipper">
      <Option name="capacity">400</Option>
      <Reader type="readers.las">
          <Option name="filename">54001640PAN_heightasz_0-1.5.las</Option>
<!--          <Option name="spatialreference">EPSG:3270</Option> -->
      </Reader>
    </Filter>
  </Writer>
</Pipeline>

Some things to note. I have commented out the SRID and readers.las.spatialreference in the XML above. We’ll rely on PDAL to discover this, and use the default output of epsg:4326 to store our point clouds for the time being.

Our wrapper script for the pipeline file is very simple. We will use the wrapper script to specify the table and the input file name.

#!/bin/bash

TABLE=`basename $1 .bpf`
INPUT=$1

#echo $TABLE
pdal pipeline -i pipeline.xml --writers.pgpointcloud.table=$TABLE --readers.bpf.filename=$INPUT

Now to use, we’ll use GNU Parallel, as much for it’s XArgs like functionality as scalability:

ls *.bpf | parallel -j6 ./pipeliner.sh {}

Now we can see what tables got loaded:

psql -d lidar
psql (9.5.0)
Type "help" for help.

lidar=# \dt
                    List of relations
 Schema |             Name              | Type  |  Owner  
--------+-------------------------------+-------+---------
 public | 54001640PAN_heightasz_0-1.5   | table | user
 public | 54001640PAN_heightasz_1.5-6   | table | user
 public | 54001640PAN_heightasz_100-200 | table | user
 public | 54001640PAN_heightasz_30-45   | table | user
 public | 54001640PAN_heightasz_45-55   | table | user
 public | 54001640PAN_heightasz_55-65   | table | user
 public | 54001640PAN_heightasz_6-30    | table | user
 public | 54001640PAN_heightasz_65-75   | table | user
 public | 54001640PAN_heightasz_75-85   | table | user
 public | 54001640PAN_heightasz_85-100  | table | user
 public | pointcloud_formats            | table | user
 public | spatial_ref_sys               | table | user
(12 rows)

W00t! We’ve got point clouds in our database! Next, we will visualize the data, and extract some analyses from it.

Posted in 3D, LiDAR, Other, PDAL, pointcloud | Tagged: , , | Leave a Comment »

Taking Slices from LiDAR data: Part III

Posted by smathermather on February 5, 2016

forest_structure
Borrowed from: http://irwantoshut.com

Continuing my series on slicing LiDAR data in order to analyze a forest, one of the objectives of the current project is to understand the habitats that particular species of birds prefer. This will be accomplished using field info from breeding bird surveys combined with LiDAR data of forest structure to help predict what habitats are necessary for particular species of breeding birds.

There are a number of studies doing just this for a variety of regions which I will detail later, but suffice it to say, structure of vegetation matters a lot to birds, so using LiDAR to map out structure can be an important predictive tool for mapping bird habitat. Being able to do that at scale across entire ecoregions—well, that’s just an exciting prospect.
Let’s get to some coding. I would like to take a few slices through our forest based on functional height groups:

Forest Canopy >15 meters
Sub-Canopy 10-15 meters
Tall Shrub 2-10 meters
Short Shrub 0.5-2 meters

(For the record, we don’t have to get these perfectly right. Sub-canopy, for example could be taller or shorter depending on how tall a forest it is in. This is just a convenience for dividing and reviewing the data for the time being. Also, we’ll split a little finer than the above numbers just for giggles.)

We’ll start by just echoing our pdal translate to make sure we like what we are getting for output:

for START in 0:0.5 0.5:1 1:2 2:5 5:10 10:15 15:20 20:35 35:50 50:75 75:100 100:200
 do
  nameend=`echo $START | sed s/:/-/g`
  echo pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_$nameend.bpf -f range --filters.range.limits="Height[$START)"
done

Thus we get the following output:

pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_0-0.5.bpf -f range --filters.range.limits=Height[0:0.5)
pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_0.5-1.bpf -f range --filters.range.limits=Height[0.5:1)
pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_1-2.bpf -f range --filters.range.limits=Height[1:2)
pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_2-5.bpf -f range --filters.range.limits=Height[2:5)
... etc.

Let’s remove our echo statement so this actually runs:

for START in 0:0.5 0.5:1 1:2 2:5 5:10 10:15 15:20 20:35 35:50 50:75 75:100 100:200
 do
  nameend=`echo $START | sed s/:/-/g`
  pdal translate 54001640PAN_heightasz.bpf 54001640PAN_heightasz_$nameend.bpf -f range --filters.range.limits="Height[$START)"
done

We should generalize this a bit too. Let’s make this a script to which we can pass our filenames and ranges:

#!/bin/bash
namer=`basename $1 .bpf`
for START in $2
 do
  nameend=`echo $START | sed s/:/-/g`
  pdal translate $namer.bpf $namer"_"$nameend".bpf" -f range --filters.range.limits="Height[$START)"
done

Which to run, we use a statement as follows:

./tree_slicer.sh 54001640PAN_heightasz.bpf "0:0.5 0.5:1 1:2 2:5 5:10 10:15 15:20 20:35 35:50 50:75 75:100 100:200"

I forgot all my data is in feet, but my height classes in meters, so we’ll just multiply our input values by a factor of 3 to get in the same ballpark (future refinement likely):

./tree_slicer.sh 54001640PAN_heightasz.bpf "0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200"

We could alternatively stick to our original categories (short shrub, tall shrub, sub-canopy, canopy) as our break points:

./tree_slicer.sh 54001640PAN_heightasz.bpf "0:1.5 1.5:6 6:30 30:45 45:200"

Finally, we can convert to laz, and load all our slices of point clouds in plas.io an animate between the slices

for OUTPUT in $(ls *.bpf); do docker run -v /home/gisuser/test/test:/data pdal/master pdal translate //data/$OUTPUT //data/$OUTPUT.laz; done

If you look closely, you should be able to see where a tornado in the 80s knocked down much of the forest here. It’s signature is in the tall shrub / sub-canopy layer:

Posted in 3D, LiDAR, Other, PDAL, pointcloud | Tagged: , , | 2 Comments »

parallel processing in PDAL

Posted by smathermather on January 28, 2016

Frankfurt Airport tunnel.JPG
Frankfurt Airport tunnel” by Peter IsotaloOwn work. Licensed under CC BY-SA 3.0 via Commons.

In my ongoing quest to process all the LiDAR data for Pennsylvania and Ohio into one gigantic usable dataset, I finally had to break down and learn how to do parallel processing in BASH. Yes, I still need to jump on the Python band wagon (the wagon is even long in the tooth, if we choose to mix metaphors), but BASH makes me soooo happy.

So, in a previous post, I wanted to process relative height in a point cloud. By relative height, I mean height relative to ground. PDAL has a nice utility for this, and it’s pretty easy to use, if you get PDAL installed successfully.


pdal translate 55001640PAN.las 55001640PAN_height.bpf height --writers.bpf.output_dims=&amp;quot;X,Y,Z,Height&amp;quot;;

Installing PDAL is not too easy, so I used the dockerized version of PDAL and it worked great. Problem is, the dockerized version complicates my commands on the command line, especially if I want to run it on a bunch of files.

Naturally, the next step is to run it on a whole bunch of LiDAR files. For that I need a little control script which I called pdal_height.sh, and then I need to run that in a “for” loop.

#!/bin/bash
# Get the pathname from the input value
pathname="${1%/*}";

# Get the short name of the file, sans path and sans extension
name=`basename $1 .las`

docker run -v $pathname:/data pdal/master pdal translate //data/"$name".las //data/"$name"_height.bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId";

Now we need a basic for loop will take care of sending the las files into our pdal_height.sh, thus looping through all available las files:


for OUTPUT in $(ls *.las); do ~/./pdal_height.sh $OUTPUT; done;

This is great, but I calculated it would take 13 days to complete on my 58366 LiDAR files. We’re talking approximately 41,000 square miles of non-water areas for Ohio, and approximately 44,000 square miles of non-water areas for Pennsylvania. I’m on no particular timeline, but I’m not really that patient. Quick duckduckgo search later, and I remember the GNU Parallel project. It’s wicked easy to use for this use case.

ls *.las | parallel -j24 ~/./pdal_height.sh

How simple! First, we list our las files, then we “pipe” them as a list to parallel, we tell parallel we want it to spawn 24 independent processes using that list as the input for our pdal_height script.

Now we can run it on 24 cores simultaneously. Sadly, I have slow disks 😦 so really I ran it like this:

ls *.las | parallel -j6 ~/./pdal_height.sh

Time to upgrade my disks! Finally, I want to process my entire LiDAR dataset irrespective of location. For this, we use the find command, name the starting directory location, and tell it we want to search based on name.

find /home/myspecialuser/LiDAR/ -name "*.las" | parallel -j6 ~/./pdal_height.sh

Estimated completion time: 2 days. I can live with that until I get better disks. Only one problem, I should make sure this doesn’t stop if my network drops for any reason. Let’s wrap this in nohup which will prevent network-based hangups:

nohup sh -c 'find /home/myspecialuser/LiDAR -name "*.las" | parallel -j6 ~/./pdal_height.sh {}'

Posted in 3D, Analysis, parallel, PDAL, pointcloud | Tagged: , , , | Leave a Comment »

wget for downloading boatloads of data

Posted by smathermather on January 27, 2016

My current project to create a complete dataset of airborne LiDAR data for Ohio and Pennsylvania has been teaching me some simple, albeit really powerful tricks. We’ll just discuss one today — recursive use of wget. This allows us to download entire trees of web sites to mirror, or in our case download all the data. Additionally, wget works on ftp trees as well, with no real change in syntax.


wget -r ftp://pamap.pasda.psu.edu/pamap_lidar/cycle1/LAS/

Capture

I recognize that curl can be a more powerful utility than wget, but for dead simple problems like this, a dead simple tool is more than adequate.

Posted in 3D, Docker, PDAL, pointcloud | Tagged: , , , | Leave a Comment »

Point Clouds – the (re)start of a journey

Posted by smathermather on January 19, 2016

If you have followed this blog for a while, you will notice a continual returning to and refinement of ideas and topics. That’s how the blog started, and this role it has served, as a touch stone in my exploration of topics is critical to how I use it. I hope it is useful to you too, as a reader.

So, let’s talk about point clouds again. Point clouds are like ordinary geographic points — they have X, Y, and Z (typically), but there are a whole lot more of them than ordinary. How much more? Well, instead of thinking in the 100s of thousands, we think more along the lines of 100 of millions or billions of points.

The other thing we need to recognize about point clouds is that they may be n-dimensional — a LiDAR point cloud may have X, Y, Z, intensity, return number, class, etc..

Good Free and Open Source Software (FOSS) tools for dealing with point clouds include a spectrum of projects, including PDAL (Point Data Abstraction Library), and the Pointcloud extension for PostgreSQL (with PostGIS integration, of course).

Previously, I was attempting to process large amounts of LiDAR point clouds in order to model bird habitat. Now that I have the requisite storage, and PDAL has a height calculator, I am ready to dive back into this.

In the meantime, there are some cool things to check in this space. Let’s start with this presentation by Michael Smith on PDAL from FOSS4G in Seoul last year.

Screen Shot 2016-01-19 at 8.32.03 PM

Want to see some of the incredible things you can do in the browser, go no further than here:

Screen Shot 2016-01-19 at 8.34.52 PM.png

This goes deeper than simple vector tile approaches and employs level of detail optimizations that are necessary for this large of a use case. More on this later

postscript 1/20/2016:

Looks like I outed the above project… . Apologies to https://hobu.co for this. For the record and from their site:

Hobu is a team of three software engineers located in Iowa City, Iowa and Cambridge, Iowa. We have been on the forefront of open source LiDAR software for over five years, and we have been building open source GIS software for even longer. We provide open source lidar software systems development, open source GIS software development in addition to contract research and development, systems design, evaluation and implementation.

Please contact us about any of the open source software for which we provide support.

We are the primary support organization for the following libraries:

  • PDAL — Point Data Abstraction Library
  • plas.io — WebGL point cloud rendering
  • Greyhound — Point cloud data streaming
  • libLAS — ASPRS LAS read/write

Welp, there we go, an unintentional public launch. Awesome work though… .

Posted in 3D, Database, FLANN, pointcloud, PostGIS, PostgreSQL, SQL | Tagged: , , | 1 Comment »