Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Posts Tagged ‘pointcloud’

OpenDroneMap on the road part II

Posted by smathermather on March 28, 2017

Thinking a little more about moderately large compute resources and their container (see previous post), I revised my analysis to look and see if we can fit these 10 NUCs plus switch and outlets into a carry-on sized case. It turns out, at first blush, it seems feasible:

pelican_cloud_1535

Posted in 3D, Docker, OpenDroneMap, PDAL | Tagged: , , | 5 Comments »

OpenDroneMap on the road

Posted by smathermather on March 27, 2017

Contemplation

This is a theoretical post. Imagine for a moment that OpenDroneMap can scale to the compute resources that you have in an elastic and sane way (we are short weeks away from the first work on this), and so, if you are a typical person in the high-speed internet world, you might be thinking, “Great! Let’s throw this up on the cloud!”

But imagine for a moment you are in a network limited environment. Do you process on a local laptop? Do you port around a desktop? The folks in the humanitarian space think about this a lot — depending on the project, one could spend weeks or months in network limited environments.

Enter POSM

Folks at American Red Cross (ARC) have been thinking about this a lot. What has resulted, in order to aid in mapping e.g. rural areas in West Africa is Portable OpenStreetMap, or POSM, a tool for doing all the OpenStreetMap stuff, but totally and temporarily offline.

The software for this is critical, but I’ve been increasingly interested in the hardware side of things. OpenDroneMap, even with it’s upcoming processing, memory, and scaling improvements will still require more compute resources than, say OpenMapKit and Field Papers. I’ve been contemplating that once the improvements are in place, what kind of compute center could you haul in the field with you?

I’m not just thinking humanitarian and development use cases either — what can we do to make processing drone imagery in the field faster? Can we make it fast enough to get results before leaving the field? Can we modify our flight planning based on the stream of data being processed and adapt while we are there? Our real costs for flying are often finding staff and weather windows that are good, and sometimes we miss opportunities in the delay between imagery capture and processing. How can we close that loop faster?

The NUC

On the hardware side of the house, the folks at ARC are using Intel NUC kits. For ODM, as I understand it, they go a step up in processing power from their specs to something with an i7. So, I got to thinking — can we put together a bunch of these, running on a generator, and not break the bank on weight (keep it under 50 lbs)? It turns out, maybe we can. For a round $10,000, you might assemble 10 of these 4-core NUCs with a network switch, stuff it into a Pelican Air 1605 case, with 320 GB RAM, and 2.5 TB of storage. More storage can be added if necessary.

This is a thought experiment so far, and may not be the best way to get compute resources in the field, your mileage may vary, etc., but it’s and interesting though.field_compute.PNG

Cost Breakdown

field_compute1

Follow up

Any thoughts? Anyone deployed serious compute resources to the field for drone image processing? I’d love to hear what you think.

Posted in 3D, Docker, OpenDroneMap, PDAL | Tagged: , , | 2 Comments »

Scaling OpenDroneMap, necessary (and fun!) next steps

Posted by smathermather on March 8, 2017

Project State

OpenDroneMap has really evolved since I first put together a concept project presented at FOSS4G Portland in 2014, and hacked with my first users (Michele M. Tobias & Alex Mandel). At this stage, we have a really nicely functioning tool that can take drone images and output high-quality geographic products. The project has 45 contributors, hundreds of users, and a really great community (special shout-out to Piero Toffanin and Dakota Benjamin without whom the project would be nowhere near as viable, active, or wonderful). Recent improvements can be roughly categorized into data quality improvements and usability improvements. Data quality improvements were aided by the inclusion better point cloud creation from OpenSfM and better texturing from mvs-texturing. Usability improvements have largely been in the development of WebODM as a great to use and easy-to-deploy front end for OpenDroneMap.

With momentum behind these two directions — improved usability and improved data output, it’s time to think a little about how we scale OpenDroneMap. It works great for individual flights (up to a few hundred images at a time), but a promise of open source projects is scalability. Regularly we get questions from the community about how they can run ODM on larger and larger datasets in a sustainable and elastic way. To answer these questions, let me outline where we are going.

Project Future

Incremental optimizations

When I stated that scalability is one of the promises of open source software. I mostly meant scaling up: if I need more computing resources with an open source project, I don’t have to purchase more software licenses, I just need to rent or buy more computing resources. But an important element to scalability is the per unit use of computing resources as well. If we are not efficient and thoughtful about how we use things on the small scale, then we are not maximizing our scaled up resources.  Are we efficient in memory usage; is our matching algorithm as accurate as possible for the level of accuracy thus being efficient with the processor resources I have; etc.? I think of this as improving OpenDroneMap’s ability to efficiently digest data.

Magic school bus going doing the digestive system

Incremental toolchain optimizations are thus part of this near future for OpenDroneMap (and by consequence OpenSfM, the underlying computer vision tools for OpenDroneMap), focusing on memory and processor resources. The additional benefit here is that small projects and small computing resources also benefit. For humanitarian and development contexts where compute and network resources are limiting, these incremental improvements are critical. Projects like American Red Cross’ Portable OpenStreetMap (POSM) will benefit from these improvements, as will anyone in the humanitarian and development communities that need efficient processing of drone imagery offline.

To this end, three approaches are being considered for incremental improvements.  Matching speed could be improved by the use of Cascade Hashing matching or Bag of Words based method.Memory improvements could come via improved correspondence graph data structures and possibly SLAM-like pose-graph methods for global adjustment of camera positions in order to avoid global bundle adjustment.

Figure from Bag of Words paper

Figure from Bag of Words paper

Large-scale pipeline

In addition to incremental improvements, for massive datasets we need an approach to splitting up our dataset into manageable chunks. If incremental improvements help us better and more quickly process datasets, the large-scale pipeline is the teeth of this approach — we need to cut and chew up our large datasets into smaller chunks to digest.

Image of Dr. Teeth of the Muppets.

Dr. Teeth

If for a given node I can process 1000 images efficiently, but I have 80,000 images, I need a process that splits my dataset into 80 manageable chunks and processes through them sequentially or in parallel until done. Maybe I have 9000 images? Then I need it split into 9 chunks.

Image over island showing grid of 9 for spliting an aerial dataset

Eventually, I want to synthesize the outputs back into a single dataset. Ideally I split the dataset with some overlap as follows:

Image over island showing grid of 9 for spliting an aerial dataset shown with overlap

Problems with splitting SfM datasets

We do run into some very real problems with splitting our datasets into chunks for processing. There are a variety of issues, but the most stark is consistency issues from the resultant products. Quite often our X, Y, and Z values won’t match in the final reconstructions. This becomes critical when performing, e.g. hydrologic analyses on resultant Digital Terrain Models.

Water flow on patched DEM showing pooling effects around discontinuities

Water flow on patched DEM showing pooling effects around discontinuities (credit: Anna Petrasova et al)

Anna Petrasova et al address merging disparate DEM’s in GRASS with Seamless fusion of high-resolution DEMs from multiple sources with r.patch.smooth.

Water flow on fused DEM

Water flow on fused DEM showing corrected flow (credit: Anna Petrasova et al)

What Anna describes and solves is the problem of matching LiDAR and drone data and assumes that the problems between the datasets are sufficiently small that smoothing the transition between the datasets is adequate. Unfortunately, when we process drone imagery in chunks, we can get translation, rotation, skewing, and a range of other differences that often cannot be accounted for when we’re processing the digital terrain model at the end.

What follows is a small video of a dataset split and processed in two chunks. Notice offsets, rotations, and other issues of mismatch in the X and Y dimensions, and especially Z.

When we see these differences in the resultant digital terrain model, the problem can be quite stark:

Elevation differences along seamline of merged OpenDroneMap DTMs

Elevation differences along seamline of merged OpenDroneMap DTMs

To address these issues we require both the approach that Anna proposes that fixes for and smooths out small differences, and a deeper approach specific to matching drone imagery datasets to address the larger problems.

Deeper approach to processing our bites of drone data

To ensure we are getting the most out of stitching these pieces of data back together at the end, we require using a very similar matching approach to what we use in the matching of images to each other. Our steps will be something like as follows:

  • Split our images to groups
  • Run reconstruction on each group
  • Align and tranform those groups to each other using matching features between the groups
  • For secondary products, like Digital Terrain Models, blend the outputs using an approach similar to r.patch.smooth.

In close

I hope you enjoyed a little update on some of the upcoming features for OpenDroneMap. In addition to the above, we’ll also be wrapping in reporting and robustness improvements. More on that soon, as that is another huge piece that will help the entire community of users.

(This post CC BY-SA 4.0 licensed)

(Shout out to Pau Gargallo Piracés of Mapillary for the technical aspects of this write up. He is not responsible for any of the mistakes, generalities, and distortions in the technical aspects. Those are all mine).

Posted in 3D, Docker, OpenDroneMap, OpenDroneMap, PDAL | Tagged: , , | 2 Comments »

Taking Slices from ~~LiDAR~~ OpenDroneMap data: Part X

Posted by smathermather on February 23, 2017

Part 10 of N… , wait. This is a lie. This post is actually about optical drone data, not LiDAR data. This is about next phase features fro OpenDroneMap — automated and semiautomation of the point clouds, creation of DTMs and other fun such stuff.

To date, we’ve only extracted Digital Surface Models from ODM — the top surface of everything in the scene. As it is useful for hydrological modeling and other purposes to have a Digital Terrain Model estimated, we’ll be including PDAL’s Progressive Morphological Filter for the sake of DEM extraction. Here’s a small preview:

Posted in 3D, Docker, OpenDroneMap, PDAL | Tagged: , , | Leave a Comment »

Taking Slices from LiDAR data: Part IX

Posted by smathermather on February 20, 2017

Part 9 of N… , see e.g. my previous post on the topic.

We’ve been working to reduce the effect of overlapping samples on statistics we run on LiDAR data, and to do so, we’ve been using PDAL’s filters.sample approach. One catch: this handles the horizontal sampling problem well, but we might want to intentionally retain samples from high locations — after all, I want to see the trees for the forest and vice versa. So, it might behoove us to sample within each of our desired height classes to retain as much vertical information as possible.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 1 Comment »

Taking Slices from LiDAR data: Part VIII

Posted by smathermather on February 18, 2017

Part 8 of N… , see e.g. my previous post on the topic.

I didn’t think my explanation of sampling problems with LiDAR data in my previous post was adequate. Here are a couple more figures for clarification.

We can take this dataset over trees, water, fences, and buildings that is heavily sampled in some areas and sparsely sampled in others and use PDAL’s filters.sample (Poisson dart-throwing) to create an evenly sampled version of the dataset.

Figure showing overlap of LiDAR scanlines

Figure showing overlap of LiDAR scanlines

Figure showing data resampled for eveness

Figure showing data resampled for evenness

An extra special thanks to the PDAL team for not only building such cool software, but being so responsive to questions!

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 1 Comment »

Taking Slices from LiDAR data: Part VII

Posted by smathermather on February 15, 2017

Part 7 of N… , see e.g. my previous post on the topic.

More work on taking LiDAR slices. This time, the blog post is all about data preparation. LiDAR data, in its raw form, often has scan line effects when we look at density of points.

lidar_flightlines

This can affect statistics we are running, as our sampling effort is not even. To ameliorate this affect a bit, we can decimate our point cloud before doing further work with it. In PDAL, we have three choices for decimation: filters.decimation, which samples every Nth point from the point cloud; filters.voxelgrid, which does volumetric pixel based resampling; and filters.sample or “Poisson sampling via ‘Dart Throwing'”.

filters.decimation won’t help us with the above problem. Voxelgrid sampling could help, but it’s very regular, so I reject this on beauty grounds alone. This leaves filters.sample.

The nice thing about both the voxelgrid and the poisson sampling is that they retain much of the shape of the point cloud while down sampling the data:

subsample-ex1

subsample-ex2

We will execute the poisson sampling in PDAL. As many things in PDAL are best done with a (json) pipeline file, we construct a pipeline file describing the filtering we want to do, and then call that from the command line:

We can slice our data up similar to previous posts, and then look at the point density per slice. R-code for doing this forthcoming (thanks to Chris Tracey at Western Pennsylvania Conservancy and the LidR project), but below is a graphic as a teaser. For the record, we will probably pursue a fully PDAL solution in the end, but really interesting results in the interim:

image001

More to come. Stay tuned.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 2 Comments »

Taking Slices from LiDAR data: Part VI

Posted by smathermather on March 19, 2016

I finally got PDAL properly compiled with Point Cloud Library (PCL) baked in. Word to the wise — CLANG is what the makers are using to compile. The PDAL crew were kind enough to revert the commit which broke GCC support, but why swim upstream? If you are compiling PDAL yourself, use CLANG. (Side note, the revert to support GCC was really helpful for ensuring we could embed PDAL into OpenDroneMap without any compiler changes for that project.)

With a compiled version of PDAL with the PCL dependencies built in, I can bypass using the docker instance. When I was spawning tens of threads of Docker and then killing them, recovery was a problem (it would often hose my docker install completely). I’m sure there’s some bug to report there, or perhaps spawning 40 docker threads is ill advised for some grander reason, but regardless, running PDAL outside a container has many benefits, including simpler code. If you recall our objectives with this script, we want to:

  • Calculate relative height of LiDAR data
  • Slice that data into bands of heights
  • Load the data into a PostgreSQL/PostGIS/pgPointCloud database.

The control script without docker becomes as follows:

#!/bin/bash 

# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
pathname=`dirname $readlinker`
# basename will strip off the directory name and the extension
name=`basename $1 .las`

# PDAL must be built with PCL.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html

pdal translate "$name".las "$name".bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,HeightAboveGround"

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
        do
        # We'll use the height classes to name our output files and tablename.
        # A little cleanup is necessary, so we're removing the colon ":".
        nameend=`echo $START | sed s/:/-/g`

        # Name our output
        bpfname=$name"_"$nameend.bpf

        # Implement the height range filter
        pdal translate $name.bpf $bpfname -f range --filters.range.limits="HeightAboveGround[$START)"

        # Now we put our data in the PostgreSQL database.
        pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='pa_layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
done

We still require our pipeline xml in order to set our default options as follows:

<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
  <Writer type="writers.pgpointcloud">
    <Option name="connection">
      host='localhost' dbname='user' user='user' password=‘password’
    </Option>
    <Option name="table">54001640PAN_heightasz_0-1.5</Option>
    <Option name="compression">dimensional</Option>
    <Filter type="filters.chipper">
      <Option name="capacity">400</Option>
      <Reader type="readers.bpf">
      <Option name="filename">54001640PAN_heightasz_0-1.5.bpf</Option>
      </Reader>
    </Filter>
  </Writer>
</Pipeline>

And as before, we can use parallel to make this run a little lot faster:

find . -name '*.las' | parallel -j20 ./pdal_processor.sh

For the record, I found out through testing that my underlying host only has 20 processors (though more cores). No point in running more processes than that… .

So, when point clouds get loaded, they’re broken up in to “chips” or collections of points. How many chips do we have so far?:

user=# SELECT COUNT(*) FROM "pa_layer_0-1.5";
  count   
----------
 64413535
(1 row)

Now, how many rows is too many in a PostgreSQL database? Answer:

In other words, your typical state full of LiDAR (Pennsylvania or Ohio for example) are not too large to store, retrieve, and analyze. If you’re in California or Texas, or have super dense stuff that’s been flown recently, you will have to provide some structure in the form of partitioning your data into separate tables based on e.g. geography. You could also modify your “chipper” size in the XML file. I have used the default 400 points per patch (for about 25,765,414,000 points total), which is fine for my use case as then I do not exceed 100 million rows once the points are chipped:

      <Option name="capacity">400</Option>

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | 3 Comments »

Taking Slices from LiDAR data: Part V

Posted by smathermather on February 10, 2016

For this post, let’s combine the work in the last 4 posts in order to get a single pipeline for doing the following:

  • Calculate relative height of LiDAR data
  • Slice that data into bands of heights
  • Load the data into a PostgreSQL/PostGIS/pgPointCloud database.
#!/bin/bash 

# readlink gets us the full path to the file. This is necessary for docker
readlinker=`readlink -f $1`
# returns just the directory name
pathname=`dirname $readlinker`
# basename will strip off the directory name and the extension
name=`basename $1 .las`

# Docker run allows us to leverage a pdal machine with pcl built in,
# thus allowing us to calculate height.
# See http://www.pdal.io/tutorial/calculating-normalized-heights.html
docker run -v $pathname:/data pdal/master pdal translate //data/"$name".las //data/"$name"_height.bpf height --writers.bpf.output_dims="X,Y,Z,Intensity,ReturnNumber,NumberOfReturns,ScanDirectionFlag,EdgeOfFlightLine,Classification,ScanAngleRank,UserData,PointSourceId,Height";

# Now we split the lidar data into slices of heights, from 0-1.5 ft, etc.
# on up to 200 feet. We're working in the Midwest, so we don't anticipate
# trees much taller than ~190 feet
for START in 0:1.5 1.5:3 3:6 6:15 15:30 30:45 45:60 60:105 105:150 150:200
 do
  # We'll use the height classes to name our output files and tablename.
  # A little cleanup is necessary, so we're removing the colon ":".
  nameend=`echo $START | sed s/:/-/g`

  # Name our output
  bpfname=$name"_"$nameend.bpf

  # Implement the height range filter
  pdal translate $name"_height".bpf $bpfname -f range --filters.range.limits="Height[$START)"

  # Now we put our data in the PostgreSQL database.
  pdal pipeline -i pipeline.xml --writers.pgpointcloud.table='layer_'$nameend --readers.bpf.filename=$bpfname --writers.pgpointcloud.overwrite='false'
done

Now, we can use parallel to make this run a little faster:

find . -name "*.las" | parallel -j6 ./pdal_processor.sh {}&

Sadly, we can run into issues in running this in parallel:

PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
DETAIL:  Key (pcid)=(1) already exists.


PDAL: ERROR:  duplicate key value violates unique constraint "pointcloud_formats_pkey"
DETAIL:  Key (pcid)=(1) already exists.

This issue is a one time issue, however — we just can’t parallelize table creation. Once the tables are created however, I believe we can parallelize without issue. I’ll report if I find otherwise.

Posted in 3D, Database, Docker, LiDAR, Other, PDAL, pointcloud, PostGIS, PostgreSQL | Tagged: , , , , , | Leave a Comment »

Taking Slices from LiDAR data: Part IV

Posted by smathermather on February 5, 2016

Trans-Alaska Pipeline System Luca Galuzzi 2005

In PDAL, a pipeline file can be used to do a variety of operations. Within the following context, I think of a pipeline file like an ad hoc preferences file, where I can use an external command to iterate through the things I want to change, while holding constant everything else in the pipeline file.

In my use case for this vignette, I’ll use the pipeline file to hold my database preferences for putting the point clouds into my PostGIS database. For the record, I’m using vpicavet’s docker pggis as the starting place for installing PostGIS with the pgpointcloud extension. I have adapted the following pipeline file from the PDAL writers.pgpointcloud example.

<?xml version="1.0" encoding="utf-8"?>
<Pipeline version="1.0">
  <Writer type="writers.pgpointcloud">
    <Option name="connection">
      host='localhost' dbname=‘lidar’ user='user' password=‘password’
    </Option>
    <Option name="table">54001640PAN_heightasz_0-1.5</Option>
    <Option name="compression">dimensional</Option>
<!--    <Option name="srid">3270</Option> -->
    <Filter type="filters.chipper">
      <Option name="capacity">400</Option>
      <Reader type="readers.las">
          <Option name="filename">54001640PAN_heightasz_0-1.5.las</Option>
<!--          <Option name="spatialreference">EPSG:3270</Option> -->
      </Reader>
    </Filter>
  </Writer>
</Pipeline>

Some things to note. I have commented out the SRID and readers.las.spatialreference in the XML above. We’ll rely on PDAL to discover this, and use the default output of epsg:4326 to store our point clouds for the time being.

Our wrapper script for the pipeline file is very simple. We will use the wrapper script to specify the table and the input file name.

#!/bin/bash

TABLE=`basename $1 .bpf`
INPUT=$1

#echo $TABLE
pdal pipeline -i pipeline.xml --writers.pgpointcloud.table=$TABLE --readers.bpf.filename=$INPUT

Now to use, we’ll use GNU Parallel, as much for it’s XArgs like functionality as scalability:

ls *.bpf | parallel -j6 ./pipeliner.sh {}

Now we can see what tables got loaded:

psql -d lidar
psql (9.5.0)
Type "help" for help.

lidar=# \dt
                    List of relations
 Schema |             Name              | Type  |  Owner  
--------+-------------------------------+-------+---------
 public | 54001640PAN_heightasz_0-1.5   | table | user
 public | 54001640PAN_heightasz_1.5-6   | table | user
 public | 54001640PAN_heightasz_100-200 | table | user
 public | 54001640PAN_heightasz_30-45   | table | user
 public | 54001640PAN_heightasz_45-55   | table | user
 public | 54001640PAN_heightasz_55-65   | table | user
 public | 54001640PAN_heightasz_6-30    | table | user
 public | 54001640PAN_heightasz_65-75   | table | user
 public | 54001640PAN_heightasz_75-85   | table | user
 public | 54001640PAN_heightasz_85-100  | table | user
 public | pointcloud_formats            | table | user
 public | spatial_ref_sys               | table | user
(12 rows)

W00t! We’ve got point clouds in our database! Next, we will visualize the data, and extract some analyses from it.

Posted in 3D, LiDAR, Other, PDAL, pointcloud | Tagged: , , | Leave a Comment »