Smathermather's Weblog

Remote Sensing, GIS, Ecology, and Oddball Techniques

Quick (and likely apocryphal) post on versioning and databases

Posted by smathermather on November 15, 2014

This is a quick blog post about technologies that I don’t know well… so please comment if you know better. GeoGig and dat are great tools for addressing versioning in data, so what’s the difference?

Screen shot of GeoGig website

GeoGig is built on Java and meant for any “simple features” geometry (points, lines, polygons).

It’s strength is that it is built from the ground up to handle geometries well, going beyond CRUD functions to specifically address geospatial problems in versioning. Think of it as git for geospatial data.

There’s a hosted version in pre-release from BoundlessGeo called Versio and meant to be the GitHub for geospatial data. You can run your local version from http://geogig.org/

From the website:

“Users are able to import raw geospatial data (currently from Shapefiles, PostGIS or SpatiaLite) in to (sic) a repository where every change to the data is tracked. These changes can be viewed in a history, reverted to older versions, branched in to sandboxed areas, merged back in, and pushed to remote repositories.”

—————————————————————————
Ok, so how about dat?

Screen shot of dat website

dat is built on javascript, meant to do streaming data and some other cool features and does CRUD versioning. Think of it as git for data built by web people. Therefore, it “defines an API for reading, writing and syncing datasets”, as opposed to a repository into which one would import data.

“Dat is an open source project that provides a streaming interface between every file format and data storage backend.”

A cursory look indicates it will work for geospatial data, but effectively as blobs, with no special handling for changes within features like GeoGig. But, it does what GeoGig does not, and that is to make datasets automatically syncable.

Like all projects, each has its strengths. Choose your project wisely.

9 Responses to “Quick (and likely apocryphal) post on versioning and databases”

  1. Thanks for this post, as I’ve been wondering how dat can help me bridge the gap between SQL Server and PostGIS… but I can’t figure out if that’s what it’s for, or if it’s a tool for helping developers access these different formats only for their apps…

    Will watch for more comments / info!

    • It seems like a good tool for abstracting away the problems of data sharing between database types. IDK how far along they are with SQL Server support so far — haven’t looked… .

  2. thedeer said

    It would be great if dat would be able to support geospatial data in the future. The syncing feature sounds need. I’ve yet to use it though. Thanks for the post!

    • dat support geospatial data, but you won’t get into the weeds on within row edits in any meaningful way. And since a row in geospatial data can contain so much complexity in geometry, the CRUD approach has its limits. That said, for synchronizing data across several different mediums (without attention to the above), it seems to hold great potential.

  3. I think it’s a fair summary. Disclaimer: I was the CTO of Boundless until recently and was very involved with GeoGig and Versio up until a few weeks ago. In my current job dat is already on the radar for some of the problems we will need to solve, so I think I have kind of a unique perspective (but need more time with dat to form solid opinion).

    My current thought is GeoGig is very specialized, and targets geo only. Yes, it could be extended to support more generic datasets, but nothing is happening along those lines in the project. GeoGig’s design is very focused on translating the Git workflow to geospatial, on the command line. Building a sync feature on top of GeoGit is possible and not too hard actually. Dat on the other hand seems to take a more generic approach to data, it has more breadth and makes less assumptions, which probably means that it can’t yet handle large geospatial datasets but is able to work with a wider variety of information. Geospatial data rarely goes alone, so having a tool that can version any data is a huge plus. I think some of the streaming features of dat will prove to be a huge advantage in the medium to long run (thinking IoT, sensors and the like here).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: