Related Plugins and Tags

QGIS Planet

Tracking geoprocessing workflows with QGIS & DVC

Today’s post is a geeky deep dive into how to leverage DVC (not just) data version control to track QGIS geoprocessing workflows.

“Why is this great?” you may ask.

DVC tracks data, parameters, and code. If anything changes, we simply rerun the process and DVC will figure out which stages need to be recomputed and which can be skipped by re-using cached results.

This can lead to huge time savings compared to re-running the whole model

You can find the source code used in this post on my repo https://github.com/anitagraser/QGIS-resources/tree/dvc

I’m using DVC with the DVC plugin for VSCode but DVC can be used completely from the command line, if you prefer this appraoch.

Basically, what follows is a proof of concept: converting a QGIS Processing model to a DVC workflow. In the following screenshot, you can see the main stages

  1. The QGIS model in the upper left corner
  2. The Python script exported from the QGIS model builder in the lower left corner
  3. The DVC stages in my dvc.yaml file in the upper right corner (And please ignore the hello world stage. It’s a left over from my first experiment)
  4. The DVC DAG visualizing the sequence of stages. Looks similar to the QGIS model, doesn’t it ;-)

Besides the stage definitions in dvc.yaml, there’s a parameters file:

random-points:
  n: 10
buffer-points:
  size: 0.5

And, of course, the two stages, each as it’s own Python script.

First, random-points.py which reads the random-points.n parameter to create the desired number of points within the polygon defined in qgis3/data/test.geojson:

import dvc.api

from qgis.core import QgsVectorLayer
from processing.core.Processing import Processing
import processing

Processing.initialize()

params = dvc.api.params_show()
pts_n = params['random-points']['n']

input_vector = QgsVectorLayer("qgis3/data/test.geojson")
output_filename = "qgis3/output/random-points.geojson"

alg_params = {
    'INCLUDE_POLYGON_ATTRIBUTES': True,
    'INPUT': input_vector,
    'MAX_TRIES_PER_POINT': 10,
    'MIN_DISTANCE': 0,
    'MIN_DISTANCE_GLOBAL': 0,
    'POINTS_NUMBER': pts_n,
    'SEED': None,
    'OUTPUT': output_filename
}
processing.run('native:randompointsinpolygons', alg_params)

And second, buffer-points.py which reads the buffer-points.size parameter to buffer the previously generated points:

import dvc.api
import geopandas as gpd
import matplotlib.pyplot as plt

from qgis.core import QgsVectorLayer
from processing.core.Processing import Processing
import processing

Processing.initialize()

params = dvc.api.params_show()
buffer_size = params['buffer-points']['size']

input_vector = QgsVectorLayer("qgis3/output/random-points.geojson")
output_filename = "qgis3/output/buffered-points.geojson"

alg_params = {
    'DISSOLVE': False,
    'DISTANCE': buffer_size,
    'END_CAP_STYLE': 0,  # Round
    'INPUT': input_vector,
    'JOIN_STYLE': 0,  # Round
    'MITER_LIMIT': 2,
    'SEGMENTS': 5,
    'OUTPUT': output_filename
}
processing.run('native:buffer', alg_params)

gdf = gpd.read_file(output_filename)
gdf.plot()

plt.savefig('qgis3/output/buffered-points.png')

With these things in place, we can use dvc to run the workflow, either from within VSCode or from the command line. Here, you can see the workflow (and how dvc skips stages and fetches results from cache) in action:

If you try it out yourself, let me know what you think.

Visualizing trajectories with QGIS & MobilityDB

In the previous post, we — creatively ;-) — used MobilityDB to visualize stationary IOT sensor measurements.

This post covers the more obvious use case of visualizing trajectories. Thus bringing together the MobilityDB trajectories created in Detecting close encounters using MobilityDB 1.0 and visualization using Temporal Controller.

Like in the previous post, the valueAtTimestamp function does the heavy lifting. This time, we also apply it to the geometry time series column called trip:

SELECT mmsi,
    valueAtTimestamp(trip, '2017-05-07 08:55:40') geom,
    valueAtTimestamp(SOG, '2017-05-07 08:55:40') SOG
FROM "public"."ships"

Using this SQL query, we again set up a — not yet Temporal Controller-controlled — QueryLayer.

To configure Temporal Controller to update the timestamp in our SQL query, we again need to run the Python script from the previous post.

With this done, we are all set up to animate and explore the movement patterns in our dataset:


This post is part of a series. Read more about movement data in GIS.

Visualizing IOT time series with QGIS & MobilityDB

Today’s post presents an experiment in modelling a common scenario in many IOT setups: time series of measurements at stationary sensors. The key idea I want to explore is to use MobilityDB’s temporal data types, in particular the tfloat_inst and tfloat_seq for instances and sequences of temporal float values, respectively.

For info on how to set up MobilityDB, please check my previous post.

Setting up our DB tables

As a toy example, let’s create two IOT devices (in table iot_devices) with three measurements each (in table iot_measurements) and join them to create the tfloat_seq (in table iot_joined):

CREATE TABLE iot_devices (
    id integer,
    geom geometry(Point, 4326)
);

INSERT INTO iot_devices (id, geom) VALUES
(1, ST_SetSRID(ST_MakePoint(1,1), 4326)),
(2, ST_SetSRID(ST_MakePoint(2,3), 4326));

CREATE TABLE iot_measurements (
    device_id integer,
    t timestamp,
    measurement float
);

INSERT INTO iot_measurements (device_id, t, measurement) VALUES
(1, '2022-10-01 12:00:00', 5.0),
(1, '2022-10-01 12:01:00', 6.0),
(1, '2022-10-01 12:02:00', 10.0),
(2, '2022-10-01 12:00:00', 9.0),
(2, '2022-10-01 12:01:00', 6.0),
(2, '2022-10-01 12:02:00', 1.5);

CREATE TABLE iot_joined AS
SELECT 
    dev.id, 
    dev.geom, 
    tfloat_seq(array_agg(
        tfloat_inst(m.measurement, m.t) ORDER BY t
    )) measurements
FROM iot_devices dev 
JOIN iot_measurements m
  ON dev.id = m.device_id
GROUP BY dev.id, dev.geom;

We can load the resulting layer in QGIS but QGIS won’t be happy about the measurements column because it does not recognize its data type:

Query layer with valueAtTimestamp

Instead, what we can do is create a query layer that fetches the measurement value at a specific timestamp:

SELECT id, geom, 
    valueAtTimestamp(measurements, '2022-10-01 12:02:00') 
FROM iot_joined

Which gives us a layer that QGIS is happy with:

Time for TemporalController

Now the tricky question is: how can we wire our query layer to the Temporal Controller so that we can control the timestamp and animate the layer?

I don’t have a GUI solution yet but here’s a way to do it with PyQGIS: whenever the Temporal Controller signal updateTemporalRange is emitted, our update_query_layer function gets the current time frame start time and replaces the datetime in the query layer’s data source with the current time:

l = iface.activeLayer()
tc = iface.mapCanvas().temporalController()

def update_query_layer():
    tct = tc.dateTimeRangeForFrameNumber(tc.currentFrameNumber()).begin().toPyDateTime()
    s = l.source()
    new = re.sub(r"(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})", str(tct), s)
    l.setDataSource(new, l.sourceName(), l.dataProvider().name())

tc.updateTemporalRange.connect(update_query_layer)

Future experiments will have to show how this approach performs on lager datasets but it’s exciting to see how MobilityDB’s temporal types may be visualized in QGIS without having to create tables/views that join a geometry to each and every individual measurement.

Detecting close encounters using MobilityDB 1.0

It’s been a while since we last talked about MobilityDB in 2019 and 2020. Since then, the project has come a long way. It joined OSGeo as a community project and formed a first PSC, including the project founders Mahmoud Sakr and Esteban Zimányi as well as Vicky Vergara (of pgRouting fame) and yours truly.

This post is a quick teaser tutorial from zero to computing closest points of approach (CPAs) between trajectories using MobilityDB.

Setting up MobilityDB with Docker

The easiest way to get started with MobilityDB is to use the ready-made Docker container provided by the project. I’m using Docker and WSL (Windows Subsystem Linux on Windows 10) here. Installing WLS/Docker is out of scope of this post. Please refer to the official documentation for your operating system.

Once Docker is ready, we can pull the official container and fire it up:

docker pull mobilitydb/mobilitydb
docker volume create mobilitydb_data
docker run --name "mobilitydb" -d -p 25432:5432 -v mobilitydb_data:/var/lib/postgresql mobilitydb/mobilitydb
psql -h localhost -p 25432 -d mobilitydb -U docker

Currently, the container provides PostGIS 3.2 and MobilityDB 1.0:

Loading movement data into MobilityDB

Once the container is running, we can already connect to it from QGIS. This is my preferred way to load data into MobilityDB because we can simply drag-and-drop any timestamped point layer into the database:

For this post, I’m using an AIS data sample in the region of Gothenburg, Sweden.

After loading this data into a new table called ais, it is necessary to remove duplicate and convert timestamps:

CREATE TABLE AISInputFiltered AS
SELECT DISTINCT ON("MMSI","Timestamp") *
FROM ais;

ALTER TABLE AISInputFiltered ADD COLUMN t timestamp;
UPDATE AISInputFiltered SET t = "Timestamp"::timestamp;

Afterwards, we can create the MobilityDB trajectories:

CREATE TABLE Ships AS
SELECT "MMSI" mmsi,
tgeompoint_seq(array_agg(tgeompoint_inst(Geom, t) ORDER BY t)) AS Trip,
tfloat_seq(array_agg(tfloat_inst("SOG", t) ORDER BY t) FILTER (WHERE "SOG" IS NOT NULL) ) AS SOG,
tfloat_seq(array_agg(tfloat_inst("COG", t) ORDER BY t) FILTER (WHERE "COG" IS NOT NULL) ) AS COG
FROM AISInputFiltered
GROUP BY "MMSI";

ALTER TABLE Ships ADD COLUMN Traj geometry;
UPDATE Ships SET Traj = trajectory(Trip);

Once this is done, we can load the resulting Ships layer and the trajectories will be loaded as lines:

Computing closest points of approach

To compute the closest point of approach between two moving objects, MobilityDB provides a shortestLine function. To be correct, this function computes the line connecting the nearest approach point between the two tgeompoint_seq. In addition, we can use the time-weighted average function twavg to compute representative average movement speeds and eliminate stationary or very slowly moving objects:

SELECT S1.MMSI mmsi1, S2.MMSI mmsi2, 
       shortestLine(S1.trip, S2.trip) Approach,
       ST_Length(shortestLine(S1.trip, S2.trip)) distance
FROM Ships S1, Ships S2
WHERE S1.MMSI > S2.MMSI AND
twavg(S1.SOG) > 1 AND twavg(S2.SOG) > 1 AND
dwithin(S1.trip, S2.trip, 0.003)

In the QGIS Browser panel, we can right-click the MobilityDB connection to bring up an SQL input using Execute SQL:

The resulting query layer shows where moving objects get close to each other:

To better see what’s going on, we’ll look at individual CPAs:

Having a closer look with the Temporal Controller

Since our filtered AIS layer has proper timestamps, we can animate it using the Temporal Controller. This enables us to replay the movement and see what was going on in a certain time frame.

I let the animation run and stopped it once I spotted a close encounter. Looking at the AIS points and the shortest line, we can see that MobilityDB computed the CPAs along the trajectories:

A more targeted way to investigate a specific CPA is to use the Temporal Controllers’ fixed temporal range mode to jump to a specific time frame. This is helpful if we already know the time frame we are interested in. For the CPA use case, this means that we can look up the timestamp of a nearby AIS position and set up the Temporal Controller accordingly:

More

I hope you enjoyed this quick dive into MobilityDB. For more details, including talks by the project founders, check out the project website.


This post is part of a series. Read more about movement data in GIS.

Forget label buffers! Better maps with selective label masks in QGIS

Cartographers use all kind of tricks to make their maps look deceptively simple. Yet, anyone who has ever tried to reproduce a cartographer’s design using only automatic GIS styling and labeling knows that the devil is in the details.

This post was motivated by Mika Hall’s retro map style.

There are a lot of things going on in this design but I want to draw your attention to the labels – and particularly their background:

Detail of Mike’s map (c) Mike Hall. You can see that the rail lines stop right before they would touch the A in Valencia (or any other letters in the surrounding labels).

This kind of effect cannot be achieved by good old label buffers because no matter which color we choose for the buffer, there will always be cases when the chosen color is not ideal, for example, when some labels are on land and some over water:

Ordinary label buffers are not always ideal.

Label masks to the rescue!

Selective label masks enable more advanced designs.

Here’s how it’s done:

Selective masking has actually been around since QGIS 3.12. There are two things we need to take care of when setting up label masks:

1. First we need to enable masks in the label settings for all labels we want to mask (for example the city labels). The mask tab is conveniently located right next to the label buffer tab:

2. Then we can go to the layers we want to apply the masks to (for example the railroads layer). Here we can configure which symbol layers should be affected by which mask:

Note: The order of steps is important here since the “Mask sources” list will be empty as long as we don’t have any label masks enabled and there is currently no help text explaining this fact.

I’m also using label masks to keep the inside of the large city markers (the ones with a star inside a circle) clear of visual clutter. In short, I’m putting a circle-shaped character, such as ◍, over the city location:

In the text tab, we can specify our one-character label and – later on – set the label opacity to zero.
To ensure that the label stays in place, pick the center placement in “Offset from Point” mode.

Once we are happy with the size and placement of this label, we can then reduce the label’s opacity to 0, enable masks, and configure the railroads layer to use this mask.

As a general rule of thumb, it makes sense to apply the masks to dark background features such as the railways, rivers, and lake outlines in our map design:

Resulting map with label masks applied to multiple labels including city and marine area labels masking out railway lines and ferry connections as well as rivers and lake outlines.

If you have never used label masks before, I strongly encourage you to give them a try next time you work on a map for public consumption because they provide this little extra touch that is often missing from GIS maps.

Happy QGISing! Make maps not war.

Official Austrian basemap and cadastre vector tiles

The BEV (Austrian Bundesamt für Eich- und Vermessungswesen) has recently published the Austrian cadastre as open data:

The URLs for vector tiles and styles can be found on https://kataster.bev.gv.at under Guide – External

The vector tile URL is:

https://kataster.bev.gv.at/tiles/{kataster | symbole}/{z}/{x}/{y}.pbf

There are 4 different style variations:

https://kataster.bev.gv.at/styles/{kataster | symbole}/style_{vermv | ortho | basic | gis}.json

When configuring the vector tiles in QGIS, we specify the desired tile and style URLs, for example:

For example, this is the “gis” style:

And this is the “basic” style:

The second vector tile source I want to mention is basemap.at. It has been around for a while, however, early versions suffered from a couple of issues that have now been resolved.

The basemap.at project provides extensive documentation on how to use the dataset in QGIS and other GIS, including manuals and sample projects:

Here’s the basic configuration: make sure to set the max zoom level to 16, otherwise, the map will not be rendered when you zoom in too far.

The level of detail is pretty impressive, even if it cannot quite keep up with the basemap raster tiles:

Vector tile details at Resselpark, Vienna
Raster basemap details at Resselpark, Vienna

Building an interactive app with geocoding in Jupyter Lab

This post aims to show you how to create quick interactive apps for prototyping and data exploration using Panel.

Specifically, the following example demos how to add geocoding functionality based on Geopy and Nominatim. As such, this example brings together tools we’ve previously touched on in Super-quick interactive data & parameter exploration and Geocoding with Geopy.

Here’s a quick preview of the resulting app in action:

To create this app, I defined a single function called my_plot which takes the address and desired buffer size as input parameters. Using Panel’s interact and servable methods, I’m then turning this function into the interactive app you’ve seen above:

import panel as pn
from geopy.geocoders import Nominatim
from utils.converting import location_to_gdf
from utils.plotting import hvplot_with_buffer

locator = Nominatim(user_agent="OGD.AT-Lab")

def my_plot(user_input="Giefinggasse 2, 1210 Wien", buffer_meters=1000):
    location = locator.geocode(user_input)
    geocoded_gdf = location_to_gdf(location, user_input)
    map_plot = hvplot_with_buffer(geocoded_gdf, buffer_meters, 
                                  title=f'Geocoded address with {buffer_meters}m buffer')
    return map_plot.opts(active_tools=['wheel_zoom']) 

kw = dict(user_input="Giefinggasse 2, 1210 Wien", buffer_meters=(0,10000))

pn.template.FastListTemplate(
    site="Panel", title="Geocoding Demo", 
    main=[pn.interact(my_plot, **kw)]
).servable();

You can find the full notebook in the OGD.AT Lab repository or run this notebook directly on MyBinder:

To open the Panel preview, press the green Panel button in the Jupyter Lab toolbar:

I really enjoy building spatial data exploration apps this way, because I can start off with a Jupyter notebook and – once I’m happy with the functionality – turn it into a pretty app that provides a user-friendly exterior and hides the underlying complexity that might scare away stakeholders.

Give it a try and share your own adventures. I’d love to see what you come up with.

Dynamic Infographic Map Tutorial

This is a guest post by Mickael HOARAU @Oneil974

As an update of the tutorial from previous years, I created a tutorial showing how to make a simple and dynamic color map with charts in QGIS.

In this tutorial you can see some of interesting features of QGIS and its community plugins. Here you’ll see variables, expressions, filters, QuickOSM and DataPlotly plugins and much more. You just need to use QGIS 3.24 Tisler version.

Here is the tutorial.

New OGC Moving Features JSON support in MovingPandas

First time, we talked about the OGC Moving Features standard in a post from 2017. Back then, we looked at the proposed standard way to encode trajectories in CSV and discussed its issues. Since then, the Moving Features working group at OGC has not been idle. Besides the CSV and XML encodings, they have designed a new JSON encoding that addresses many of the downsides of the previous two. You can read more about this in our 2020 preprint “From Simple Features to Moving Features and Beyond”.

Basically Moving Features JSON (MF-JSON) is heavily inspired by GeoJSON and it comes with a bunch of mandatory and optional key/value pairs. There is support for static properties as well as dynamic temporal properties and, of course, temporal geometries (yes geometries, not just points).

I think this format may have an actual chance of gaining more widespread adoption.

Image source: http://www.opengis.net/doc/BP/mf-json/1.0

Inspired by Pandas.read_csv() and GeoPandas.read_file(), I’ve started implementing a read_mf_json() function in MovingPandas. So far, it supports basic MovingFeature JSONs with MovingPoint geometry:

You’ll need to use the current development version to test this feature.

Next steps will be MovingFeatureCollection JSONs and support for static as well as temporal properties. We’ll have to see if MovingPandas can be extended to go beyond moving point geometries. Storing moving linestrings and polygons in the GeoDataFrame will be the simple part but analytics and visualization will certainly be more tricky.


This post is part of a series. Read more about movement data in GIS.

2021 wrap-up: baba und foi net!

What a ride.

This year has been both extremely rewarding and incredibly frustrating, sometimes both in very short succession.

I’ve finally finished my PhD dissertation and – between movement data analysis and open source and open data science talks in general – I’ve been counting over ten invited talks and conference presentations, including keynotes at FOSS4G and GI_Forum. Unfortunately, all of these were limited to virtual experiences and therefore often lacked much of the social interaction off stage that usually makes giving talks rewarding. But FOSS4G2021 was a refreshing exception to this rule:

One of the rare in-person events I attended this year was the Futurezone 2020 Award ceremony which – of course – had been postponed to 2021. The award took me completely by surprise:

This year’s focus on talks meant that there haven’t been many blog posts with original content this year, an unfortunate situation I hope to improve in 2022.

Without wanting to promise to much, there are quite a few interesting MovingPandas collaborations in the works that will hopefully result in exciting new features, demos, and tutorials:

On the plus side, with so many virtual events – from conferences to community events such as QGIS Open Days – much of the content formerly exclusively available to participants on-site have been recorded. Some worthwhile accounts and playlists include:

Happy streaming, happy new year, and – if you can – get vaccinated!

I’ll be having a nerd party.

New MovingPandas website

The last couple of days, I have been hacking away to improve the online presence of MovingPandas.

The new home page aims to be the central landing page that provides direct links to all important resources: from source code on Github, to documentation on ReadTheDocs, and – most importantly – all the tutorial and analysis example notebooks:

Additionally, all tutorial and analysis example notebooks now contain direct links to live versions on MyBinder, sources on Github and already executed pre-rendered HTML versions of the notebooks for quick browsing:

If you are using MovingPandas, I’d love to hear about it, particularly if you want to share one of your analysis examples with the community.

Open source for open spatial data science

Thanks to the FOSS4G2021 video team, all talks including my keynote are now available online.

I had the honor to be invited to give the closing keynote, talking about how open source can help open science, particularly data science:

I’m convinced that efforts towards more open data science are a worthwhile investment even if current scientific incentive structures are stacked against it.

Until incentive policies catch up, we all can help encourage more people to go the extra mile(s) by properly valuing their efforts, e.g. by celebrating and citing reproducible publications, open research datasets, and open scientific software.

Exploring ZAMG’s new open weather data

The Central Institution for Meteorology and Geodynamics (ZAMG) is Austrian’s meteorological and geophysical service. And as such, they have a large database of historical weather data which they have now made publicly available, as announced on 28th Oct 2021:

The new ZAMG Data Hub provides weather and station data, mainly in NetCDF and CSV formats:

I decided to grab a NetCDF sample from their analysis and nowcasting system INCA. I went with all available parameters for a period of one day (the data has a temporal resolution of one hour) and a bounding box around Vienna:

https://frontend.hub.zamg.ac.at/grid/d512d5b5-4e9f-4954-98b9-806acbf754f6/historical/form?anonymous=true

The loading screen of QGIS 3.22 shows the different NetCDF layers:

After adding the incal-hourly layer to QGIS, the layer styling panel provides access to the different weather parameters. We can switch between these parameters by clicking the gradient icon next to the parameter names. Here you can see the air temperature:

And because the NetCDF layer is time-aware, we can also use the QGIS Temporal Controller to step through the hourly measurements / create an animation:

Make sure to grab the latest version of QGIS to get access to all the functionality shown here.

Movement data in GIS #36: trucks from space

Can we reliably measure truck traffic from space? Compared to private transport, spatiotemporal data on freight transport is even harder to come by. Detecting trucks using remote sensing has been a promising lead for many years but often required access to pretty specialized sensors, such as TerraSAR-X. That is why I was really excited to read about a new approach that detects trucks in commonly available Sentinel-2 imagery developed by Henrik Fisser (Julius-Maximilians-University Würzburg, Germany). So I reached out to him to learn more about the possibilities this new technology opens up. 

Vehicles are visible and detectable in Sentinel-2 data if they are large and moving fast enough (image source: ESA)

To verify his truck detection results. Henrik had already used data from truck counting stations along the German autobahn network. However, these counters are quite rare and thus cannot provide full spatial coverage. Therefore we started looking for more complete reference data. Fortunately, Nikolaus Kapser at the Austrian highway corporation ASFINAG offered his help. The Austrian autobahn toll system is gantry-based. It records when a truck passes a gantry. Using the timestamp of these truck passages and the current traffic speed, it is possible to estimate truck locations at arbitrary points in time, such as the time a Sentinel-2 image was taken. This makes it possible to assess the Sentinel-2-based truck detection along the autobahn network for complete Sentinel-2 images.

Overall, Sentinel-2-based detections tend to underestimate the number of trucks. Henrik found a strong correlation (with an average r value > 0.8) between German traffic counting stations and trucks detected by the Sentinel-2 method. These counting stations were selected for their ideal characteristics, including distance from volatile traffic situations such as a high number of highway intersections. This is very different from our comparison which covers autobahn sections in and near Vienna. We therefore expected larger detection errors. However, our new Austrian analysis reaches similar results (with r values of 0.79, 0.70, and 0.86 for three different days 2020-08-28, 2020-09-22, and 2020-11-06).

Thanks to the truck reference locations provided by ASFINAG, we were also able to analyze the spatial distribution of truck detections. We decided to compare ASFINAG data (truth) and Sentinel-2-based detections using a grid based approach with a cell size of 5×5 km. Confirming Henrik’s original results, grid cells with higher detection than ground truth values are clearly in the minority. Interestingly, many cells in Vienna (at the eastern border of the image extent) exhibit rather low relative errors compared to, for example, the cells along Westautobahn (the east-west running autobahn in the center of the image extent).

Some important remarks: The Sentinel-2-based detection method only works for large vehicles moving around 50km/h or faster. It is hence less suited to detect trucks in city traffic. Additionally, trucks in tunnel sections cannot be detected. To enable a fair comparison, we therefore flagged trucks in the ground truth dataset that were located in tunnels and excluded them from the analysis. Sentinel-2 captures the region around Vienna around 10:00 o’clock in the morning. As a result, it is not possible to assess other times of day. Finally, cloud cover will reduce the accuracy. Therefore we picked images with low reported cloud cover percentage (< 5%).

It is really exciting to finally see a truck detection method that works with readily available remote sensing data because this means that it is potentially transferable to other areas of the world where no official traffic counts are available. Furthermore, this method should be in line with data protection regulations (avoiding identification of individuals and potential reconstruction of movement trajectories) thus making it possible to use and publish the resulting data without further anonymization steps.


This post was written in collaboration with Henrik Fisser (Uni Würzburg / DLR) and Nikolaus Kasper (Asfinag MSG). Keep your eyes open for upcoming detailed publications on the Sentinel-2-based method by Henrik.


This post is part of a series. Read more about movement data in GIS.

Video recommendations from FOSDEM 2021

The Geospatial Dev Room at FOSDEM 2021 was a great event that (virtually) brought together a very diverse group of geo people.

All talk recordings are now available publicly at: fosdem.org/2021/schedule/track/geospatial

In line with the main themes of this blog, I’d particularly like to highlight the following three talks:

MoveTK: the movement toolkit A library for understanding movement by Aniket Mitra

Telegram Bot For Navigation: A perfect map app for a neighbourhood doesn’t need a map by Ilya Zverev

Spatial data exploration in Jupyter notebooks by yours truly

Introducing the open data analysis OGD.AT Lab

Data sourcing and preparation is one of the most time consuming tasks in many spatial analyses. Even though the Austrian data.gv.at platform already provides a central catalog, the individual datasets still vary considerably in their accessibility or readiness for use.

OGD.AT Lab is a new repository collecting Jupyter notebooks for working with Austrian Open Government Data and other auxiliary open data sources. The notebooks illustrate different use cases, including so far:

  1. Accessing geodata from the city of Vienna WFS
  2. Downloading environmental data (heat vulnerability and air quality)
  3. Geocoding addresses and getting elevation information
  4. Exploring urban movement data

Data processing and visualization are performed using Pandas, GeoPandas, and Holoviews. GeoPandas makes it straighforward to use data from WFS. Therefore, OGD.AT Lab can provide one universal gdf_from_wfs() function which takes the desired WFS layer as an argument and returns a GeoPandas.GeoDataFrame that is ready for analysis:

Many other datasets are provided as CSV files which need to be joined with spatial datasets to use them in spatial analysis. For example, the “Urban heat vulnerability index” dataset which needs to be joined to statistical areas.

 

Another issue with many CSV files is that they use German number formatting, where commas are used as a decimal separater instead of dots:

Besides file access, there are also open services provided by other developers, for example, Manfred Egger developed an elevation service that provides elevation information for any point in Austria. In combination with geocoding services, such as Nominatim, this makes is possible to, for example, find the elevation for any address in Austria:

Last but not least, the first version of the mobility notebook showcases open travel time data provided by Uber Movement:

The utility functions for data access included in this repository will continue to grow as new data sources are included. Eventually, it may make sense to extract the data access function into a dedicated library, similar to geofi (Finland) or geobr (Brazil).

If you’re aware of any interesting open datasets or services that should be included in OGD.AT, feel free to reach out here or on Github through the issue tracker or by providing a pull request.

Sol Katz Award – Thank you

On Thursday, I was awarded the 2020 Sol Katz award for Free and Open Source Software for Geospatial. I feel very honored to have been selected for this award and I’d like to take this opportunity to share a few words of thanks:

As people working in open source projects, we are constantly reminded that we are all standing on the shoulders of giants. However, particularly this year, we also see just how important small personal connections are. For me, my involvement with open source communities really took off when I joined the QGIS hackfest in Vienna in 2009 and I felt that my participation was really appreciated and welcome. I couldn’t imagine being without these connections anymore.

Thank you to the whole QGIS community, particularly my fellow PSC members both current and former: Tim, Andreas, Jürgen, Richard, Paolo, Otto, Marco Hugentobler, Alessandro, our new chair Marco Bernasocchi, and of course QGIS founder Gary Sherman for starting this awesome project and for still being around and actively promoting geospatial open source by publishing so many great books covering multiple different OSGeo projects.

I’d also like to thank my partner and my family for being incredibly understanding whenever I’m spend my time geeking out over a new programming project, data analysis, forum question, or conference talk.

Thank you also to my friends, colleagues and fellow members of the larger OSGeo community for sharing ideas, providing valuable feedback, and spreading the word about all the great work that’s going on all around us.

I’m constantly amazed by all the innovation happening to nourish and grow our community. And I’m looking forward to continue being a part of these efforts.

Thank you!

 

Extracting trajectory-based flows between M³ prototypes

Rendering large sets of trajectory lines gets messy fast. Different aggregation approaches have been developed to address this issue. However, most approaches, such as mobility graphs or generalized flow maps, cannot handle large input datasets. Building on M³ prototypes, the following approach can be used in distributed computing environments to extracts flows from large datasets. 

This is part 3 of “Exploring massive movement datasets”.

This flow extraction is based on a two-step process, conceptually similar to Andrienko flow maps: first, we extract M³ prototypes from the movement data. In the second step, we determine flows between these prototypes, including information about: distribution of travel speeds and number of observed transitions. The resulting flows can be visualized, for example, to explore the popularity of different paths of movement:

After the prototypes have been computed, the flow algorithm computes transitions between pairs of prototypes. An object moving from prototype A to prototype B triggers an update of the corresponding flow. To allow for distributed processing, each node in the distributed computing environment needs a copy of the previously computed prototypes. Additionally, the raw movement data records need to be converted into trajectories. Afterwards, each trajectory is processed independently, going through its records in chronological order:

  1. Find the best matching prototype for the current record
  2. Ensure that the distance to the match is below the distance threshold and that the matched prototype is different from the previous prototype
  3. Get or create the flow between the two prototypes
  4. Ensure that the prototype and flow directions are a good match for the current record’s direction
  5. Update the flow properties: travel speed and number of transitions, as well as the previous prototype reference

This approach scales to large datasets since only the prototypes, the (intermediate) flow results, and the trajectory currently being worked on have to be kept in memory for each iteration. However, this algorithm does not allow for continuous updates. Flows would have to be recomputed (at least locally) whenever prototypes changed. Therefore, the algorithm does not support exploration of continuous data streams. However, it can be used to explore large historical datasets:

Flow example: passenger vessel speed patterns showing mean flow speeds (line color: darker colors equal higher speeds) and speed variation (line width)

If you want to dive deeper, here’s the full paper:

[1] Graser, A., Widhalm, P., & Dragaschnig, M. (2020). Extracting Patterns from Large Movement Datasets. GI_Forum – Journal of Geographic Information Science, 1-2020, 153-163. doi:10.1553/giscience2020_01_s153.


This post is part of a series. Read more about movement data in GIS.

Generating trajectories from massive movement datasets

To explore travel patterns like origin-destination relationships, we need to identify individual trips with their start/end locations and trajectories between them. Extracting these trajectories from large datasets can be challenging, particularly if the records of individual moving objects don’t fit into memory anymore and if the spatial and temporal extent varies widely (as is the case with ship data, where individual vessel journeys can take weeks while crossing multiple oceans). 

This is part 2 of “Exploring massive movement datasets”.

Roughly speaking, trip trajectories can be generated by first connecting consecutive records into continuous tracks and then splitting them at stops. This general approach applies to many different movement datasets. However, the processing details (e.g. stop detection parameters) and preprocessing steps (e.g. removing outliers) vary depending on input dataset characteristics.

For example, in our paper [1], we extracted vessel journeys from AIS data which meant that we also had to account for observation gaps when ships leave the observable (usually coastal) areas. In the accompanying 10-minute talk, I went through a 4-step trajectory exploration workflow for assessing our dataset’s potential for travel time prediction:

Click to watch the recorded talk

Like the M³ prototype computation presented in part 1, our trajectory aggregation approach is implemented in Spark. The challenges are both the massive amounts of trajectory data and the fact that operations only produce correct results if applied to a complete and chronologically sorted set of location records.This is challenging because Spark core libraries (version 2.4.5 at the time) are mostly geared towards dealing with unsorted data. This means that, when using high-level Spark core functionality incorrectly, an aggregator needs to collect and sort the entire track in the main memory of a single processing node. Consequently, when dealing with large datasets, out-of-memory errors are frequently encountered.

To solve this challenge, our implementation is based on the Secondary Sort pattern and on Spark’s aggregator concept. Secondary Sort takes care to first group records by a key (e.g. the moving object id), and only in the second step, when iterating over the records of a group, the records are sorted (e.g. chronologically). The resulting iterator can be used by an aggregator that implements the logic required to build trajectories based on gaps and stops detected in the dataset.

If you want to dive deeper, here’s the full paper:

[1] Graser, A., Dragaschnig, M., Widhalm, P., Koller, H., & Brändle, N. (2020). Exploratory Trajectory Analysis for Massive Historical AIS Datasets. In: 21st IEEE International Conference on Mobile Data Management (MDM) 2020. doi:10.1109/MDM48529.2020.00059


This post is part of a series. Read more about movement data in GIS.

Movement data in GIS #31: exploring massive movement datasets

Exploring large movement datasets is hard because visualizations of movement data quickly get cluttered and hard to interpret. Therefore, we need to aggregate the data. Density maps are commonly used since they are readily available and quick to compute but they provide only very limited insight. In contrast, meaningful aggregations that can help discover patterns are computationally expensive and therefore slow to generate.

This post serves as a starting point for a series of new approaches to exploring massive movement data. This series will summarize parts of my PhD research and – for those of you who are interested in more details – there will be links to the relevant papers.

Starting with the raw location records, we use different forms of aggregation to learn more about what information a movement dataset contains:

  1. Summarizing movement using prototypes by aggregating raw location records using our flexible M³ Massive Movement Model [1]
  2. Generating trajectories by connecting consecutive records into continuous tracks and splitting them into meaningful trajectories [2]
  3. Extracting flows by summarizing trajectory-based transitions between prototypes [3]

Besides clever aggregation approaches, massive movement datasets also require appropriate computing resources. To ensure that we can efficiently explore large datasets, we have implemented the above mentioned aggregation steps in Spark. This enables us to run the computations on general purpose computing clusters that can be scaled according to the dataset size.

In the next post, we’ll look at how to summarize movement using M³ prototypes. So stay tuned!

But if you don’t want to wait, these are the original papers:

[1] Graser. A., Widhalm, P., & Dragaschnig, M. (2020). The M³ massive movement model: a distributed incrementally updatable solution for big movement data exploration. International Journal of Geographical Information Science. doi:10.1080/13658816.2020.1776293.
[2] Graser, A., Dragaschnig, M., Widhalm, P., Koller, H., & Brändle, N. (2020). Exploratory Trajectory Analysis for Massive Historical AIS Datasets. In: 21st IEEE International Conference on Mobile Data Management (MDM) 2020. doi:10.1109/MDM48529.2020.00059
[3] Graser, A., Widhalm, P., & Dragaschnig, M. (2020). Extracting Patterns from Large Movement Datasets. GI_Forum – Journal of Geographic Information Science, 1-2020, 153-163. doi:10.1553/giscience2020_01_s153.


This post is part of a series. Read more about movement data in GIS.

Back to Top

Sustaining Members