Related Plugins and Tags

QGIS Planet

Analyzing video-based bicycle trajectories

Did you know that MovingPandas also supports local image coordinates? Indeed, it does.

In today’s post, we will explore how we can use this feature to analyze bicycle tracks extracted from video footage published by Michael Szell @mszll:

The bicycle trajectory coordinates are stored in two separate lists: xs_640x360 and ys640x360:

This format is kind of similar to the Kaggle Taxi dataset, we worked with in the previous post. However, to use the solution we implemented there, we need to combine the x and y coordinates into nice (x,y) tuples:

df['coordinates'] = df.apply(
    lambda row: list(zip(row['xs_640x360'], row['ys_640x360'])), axis=1)
df.drop(columns=['xs_640x360', 'ys_640x360'], inplace=True)

Afterwards, we can create the points and compute the proper timestamps from the frame numbers:

def compute_datetime(row):
    # some educated guessing going on here: the paper states that the video covers 2021-06-09 07:00-08:00
    d = datetime(2021,6,9,7,0,0) + (row['frame_in'] + row['running_number']) * timedelta(seconds=2)
    return d
def create_point(xy):
    try: 
        return Point(xy)
    except TypeError:  # when there are nan values in the input data
        return None
new_df = df.head().explode('coordinates')
new_df['geometry'] = new_df['coordinates'].apply(create_point)
new_df['running_number'] = new_df.groupby('id').cumcount()
new_df['datetime'] = new_df.apply(compute_datetime, axis=1)
new_df.drop(columns=['coordinates', 'frame_in', 'running_number'], inplace=True)
new_df

Once the points and timestamps are ready, we can create the MovingPandas TrajectoryCollection. Note how we explicitly state that there is no CRS for this dataset (crs=None):

trajs = mpd.TrajectoryCollection(
    gpd.GeoDataFrame(new_df), 
    traj_id_col='id',  t='datetime', crs=None)

Plotting trajectories with image coordinates

Similarly, to plot these trajectories, we should tell hvplot that it should not fetch any background map tiles (’tiles’:None) and that the coordinates are not geographic (‘geo’:False):

If you want to explore the full source code, you can find my Github fork with the Jupyter notebook at: https://github.com/anitagraser/desirelines/blob/main/mpd.ipynb

The repository also contains a camera image of the intersection, which we can use as a background for our trajectory plots:

bg_img = hv.RGB.load_image('img/intersection2.png', bounds=(0,0,640,360)) 

One important caveat is that speed will be calculated in pixels per second. So when we plot the bicycle speed, the segments closer to the camera will appear faster than the segments in the background:

To fix this issue, we would have to correct for the distortions of the camera lens and perspective. I’m sure that there is specialized software for this task but, for the purpose of this post, I’m going to grab the opportunity to finally test out the VectorBender plugin.

Georeferencing the trajectories using QGIS VectorBender plugin

Let’s load the five test trajectories and the camera image to QGIS. To make sure that they align properly, both are set to the same CRS and I’ve created the following basic world file for the camera image:

1
0
0
-1
0
360

Then we can use the VectorBender tools to georeference the trajectories by linking locations from the camera image to locations on aerial images. You can see the whole process in action here:

After around 15 minutes linking control points, VectorBender comes up with the following georeferenced trajectory result:

Not bad for a quick-and-dirty hack. Some points on the borders of the image could not be georeferenced since I wasn’t always able to identify suitable control points at the camera image borders. So it won’t be perfect but should improve speed estimates.


This post is part of a series. Read more about movement data in GIS.

How to use Kaggle’s Taxi Trajectory Data in MovingPandas

Kaggle’s “Taxi Trajectory Data from ECML/PKDD 15: Taxi Trip Time Prediction (II) Competition” is one of the most used mobility / vehicle trajectory datasets in computer science. However, in contrast to other similar datasets, Kaggle’s taxi trajectories are provided in a format that is not readily usable in MovingPandas since the spatiotemporal information is provided as:

  • TIMESTAMP: (integer) Unix Timestamp (in seconds). It identifies the trip’s start;
  • POLYLINE: (String): It contains a list of GPS coordinates (i.e. WGS84 format) mapped as a string. The beginning and the end of the string are identified with brackets (i.e. [ and ], respectively). Each pair of coordinates is also identified by the same brackets as [LONGITUDE, LATITUDE]. This list contains one pair of coordinates for each 15 seconds of trip. The last list item corresponds to the trip’s destination while the first one represents its start;

Therefore, we need to create a DataFrame with one point + timestamp per row before we can use MovingPandas to create Trajectories and analyze them.

But first things first. Let’s download the dataset:

import datetime
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import opendatasets as od
from os.path import exists
from shapely.geometry import Point

input_file_path = 'taxi-trajectory/train.csv'

def get_porto_taxi_from_kaggle():
    if not exists(input_file_path):
        od.download("https://www.kaggle.com/datasets/crailtap/taxi-trajectory")

get_porto_taxi_from_kaggle()
df = pd.read_csv(input_file_path, nrows=10, usecols=['TRIP_ID', 'TAXI_ID', 'TIMESTAMP', 'MISSING_DATA', 'POLYLINE'])
df.POLYLINE = df.POLYLINE.apply(eval)  # string to list
df

And now for the remodelling:

def unixtime_to_datetime(unix_time):
    return datetime.datetime.fromtimestamp(unix_time)

def compute_datetime(row):
    unix_time = row['TIMESTAMP']
    offset = row['running_number'] * datetime.timedelta(seconds=15)
    return unixtime_to_datetime(unix_time) + offset

def create_point(xy):
    try: 
        return Point(xy)
    except TypeError:  # when there are nan values in the input data
        return None
 
new_df = df.explode('POLYLINE')
new_df['geometry'] = new_df['POLYLINE'].apply(create_point)
new_df['running_number'] = new_df.groupby('TRIP_ID').cumcount()
new_df['datetime'] = new_df.apply(compute_datetime, axis=1)
new_df.drop(columns=['POLYLINE', 'TIMESTAMP', 'running_number'], inplace=True)
new_df

And that’s it. Now we can create the trajectories:

trajs = mpd.TrajectoryCollection(
    gpd.GeoDataFrame(new_df, crs=4326), 
    traj_id_col='TRIP_ID', obj_id_col='TAXI_ID', t='datetime')
trajs.hvplot(title='Kaggle Taxi Trajectory Data', tiles='CartoLight')

That’s it. Now our MovingPandas.TrajectoryCollection is ready for further analysis.

By the way, the plot above illustrates a new feature in the recent MovingPandas 0.16 release which, among other features, introduced plots with arrow markers that show the movement direction. Other new features include a completely new custom distance, speed, and acceleration unit support. This means that, for example, instead of always getting speed in meters per second, you can now specify your desired output units, including km/h, mph, or nm/h (knots).


This post is part of a series. Read more about movement data in GIS.

Visualizing trajectories with QGIS & MobilityDB

In the previous post, we — creatively ;-) — used MobilityDB to visualize stationary IOT sensor measurements.

This post covers the more obvious use case of visualizing trajectories. Thus bringing together the MobilityDB trajectories created in Detecting close encounters using MobilityDB 1.0 and visualization using Temporal Controller.

Like in the previous post, the valueAtTimestamp function does the heavy lifting. This time, we also apply it to the geometry time series column called trip:

SELECT mmsi,
    valueAtTimestamp(trip, '2017-05-07 08:55:40') geom,
    valueAtTimestamp(SOG, '2017-05-07 08:55:40') SOG
FROM "public"."ships"

Using this SQL query, we again set up a — not yet Temporal Controller-controlled — QueryLayer.

To configure Temporal Controller to update the timestamp in our SQL query, we again need to run the Python script from the previous post.

With this done, we are all set up to animate and explore the movement patterns in our dataset:


This post is part of a series. Read more about movement data in GIS.

Visualizing IOT time series with QGIS & MobilityDB

Today’s post presents an experiment in modelling a common scenario in many IOT setups: time series of measurements at stationary sensors. The key idea I want to explore is to use MobilityDB’s temporal data types, in particular the tfloat_inst and tfloat_seq for instances and sequences of temporal float values, respectively.

For info on how to set up MobilityDB, please check my previous post.

Setting up our DB tables

As a toy example, let’s create two IOT devices (in table iot_devices) with three measurements each (in table iot_measurements) and join them to create the tfloat_seq (in table iot_joined):

CREATE TABLE iot_devices (
    id integer,
    geom geometry(Point, 4326)
);

INSERT INTO iot_devices (id, geom) VALUES
(1, ST_SetSRID(ST_MakePoint(1,1), 4326)),
(2, ST_SetSRID(ST_MakePoint(2,3), 4326));

CREATE TABLE iot_measurements (
    device_id integer,
    t timestamp,
    measurement float
);

INSERT INTO iot_measurements (device_id, t, measurement) VALUES
(1, '2022-10-01 12:00:00', 5.0),
(1, '2022-10-01 12:01:00', 6.0),
(1, '2022-10-01 12:02:00', 10.0),
(2, '2022-10-01 12:00:00', 9.0),
(2, '2022-10-01 12:01:00', 6.0),
(2, '2022-10-01 12:02:00', 1.5);

CREATE TABLE iot_joined AS
SELECT 
    dev.id, 
    dev.geom, 
    tfloat_seq(array_agg(
        tfloat_inst(m.measurement, m.t) ORDER BY t
    )) measurements
FROM iot_devices dev 
JOIN iot_measurements m
  ON dev.id = m.device_id
GROUP BY dev.id, dev.geom;

We can load the resulting layer in QGIS but QGIS won’t be happy about the measurements column because it does not recognize its data type:

Query layer with valueAtTimestamp

Instead, what we can do is create a query layer that fetches the measurement value at a specific timestamp:

SELECT id, geom, 
    valueAtTimestamp(measurements, '2022-10-01 12:02:00') 
FROM iot_joined

Which gives us a layer that QGIS is happy with:

Time for TemporalController

Now the tricky question is: how can we wire our query layer to the Temporal Controller so that we can control the timestamp and animate the layer?

I don’t have a GUI solution yet but here’s a way to do it with PyQGIS: whenever the Temporal Controller signal updateTemporalRange is emitted, our update_query_layer function gets the current time frame start time and replaces the datetime in the query layer’s data source with the current time:

l = iface.activeLayer()
tc = iface.mapCanvas().temporalController()

def update_query_layer():
    tct = tc.dateTimeRangeForFrameNumber(tc.currentFrameNumber()).begin().toPyDateTime()
    s = l.source()
    new = re.sub(r"(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})", str(tct), s)
    l.setDataSource(new, l.sourceName(), l.dataProvider().name())

tc.updateTemporalRange.connect(update_query_layer)

Future experiments will have to show how this approach performs on lager datasets but it’s exciting to see how MobilityDB’s temporal types may be visualized in QGIS without having to create tables/views that join a geometry to each and every individual measurement.

Detecting close encounters using MobilityDB 1.0

It’s been a while since we last talked about MobilityDB in 2019 and 2020. Since then, the project has come a long way. It joined OSGeo as a community project and formed a first PSC, including the project founders Mahmoud Sakr and Esteban Zimányi as well as Vicky Vergara (of pgRouting fame) and yours truly.

This post is a quick teaser tutorial from zero to computing closest points of approach (CPAs) between trajectories using MobilityDB.

Setting up MobilityDB with Docker

The easiest way to get started with MobilityDB is to use the ready-made Docker container provided by the project. I’m using Docker and WSL (Windows Subsystem Linux on Windows 10) here. Installing WLS/Docker is out of scope of this post. Please refer to the official documentation for your operating system.

Once Docker is ready, we can pull the official container and fire it up:

docker pull mobilitydb/mobilitydb
docker volume create mobilitydb_data
docker run --name "mobilitydb" -d -p 25432:5432 -v mobilitydb_data:/var/lib/postgresql mobilitydb/mobilitydb
psql -h localhost -p 25432 -d mobilitydb -U docker

Currently, the container provides PostGIS 3.2 and MobilityDB 1.0:

Loading movement data into MobilityDB

Once the container is running, we can already connect to it from QGIS. This is my preferred way to load data into MobilityDB because we can simply drag-and-drop any timestamped point layer into the database:

For this post, I’m using an AIS data sample in the region of Gothenburg, Sweden.

After loading this data into a new table called ais, it is necessary to remove duplicate and convert timestamps:

CREATE TABLE AISInputFiltered AS
SELECT DISTINCT ON("MMSI","Timestamp") *
FROM ais;

ALTER TABLE AISInputFiltered ADD COLUMN t timestamp;
UPDATE AISInputFiltered SET t = "Timestamp"::timestamp;

Afterwards, we can create the MobilityDB trajectories:

CREATE TABLE Ships AS
SELECT "MMSI" mmsi,
tgeompoint_seq(array_agg(tgeompoint_inst(Geom, t) ORDER BY t)) AS Trip,
tfloat_seq(array_agg(tfloat_inst("SOG", t) ORDER BY t) FILTER (WHERE "SOG" IS NOT NULL) ) AS SOG,
tfloat_seq(array_agg(tfloat_inst("COG", t) ORDER BY t) FILTER (WHERE "COG" IS NOT NULL) ) AS COG
FROM AISInputFiltered
GROUP BY "MMSI";

ALTER TABLE Ships ADD COLUMN Traj geometry;
UPDATE Ships SET Traj = trajectory(Trip);

Once this is done, we can load the resulting Ships layer and the trajectories will be loaded as lines:

Computing closest points of approach

To compute the closest point of approach between two moving objects, MobilityDB provides a shortestLine function. To be correct, this function computes the line connecting the nearest approach point between the two tgeompoint_seq. In addition, we can use the time-weighted average function twavg to compute representative average movement speeds and eliminate stationary or very slowly moving objects:

SELECT S1.MMSI mmsi1, S2.MMSI mmsi2, 
       shortestLine(S1.trip, S2.trip) Approach,
       ST_Length(shortestLine(S1.trip, S2.trip)) distance
FROM Ships S1, Ships S2
WHERE S1.MMSI > S2.MMSI AND
twavg(S1.SOG) > 1 AND twavg(S2.SOG) > 1 AND
dwithin(S1.trip, S2.trip, 0.003)

In the QGIS Browser panel, we can right-click the MobilityDB connection to bring up an SQL input using Execute SQL:

The resulting query layer shows where moving objects get close to each other:

To better see what’s going on, we’ll look at individual CPAs:

Having a closer look with the Temporal Controller

Since our filtered AIS layer has proper timestamps, we can animate it using the Temporal Controller. This enables us to replay the movement and see what was going on in a certain time frame.

I let the animation run and stopped it once I spotted a close encounter. Looking at the AIS points and the shortest line, we can see that MobilityDB computed the CPAs along the trajectories:

A more targeted way to investigate a specific CPA is to use the Temporal Controllers’ fixed temporal range mode to jump to a specific time frame. This is helpful if we already know the time frame we are interested in. For the CPA use case, this means that we can look up the timestamp of a nearby AIS position and set up the Temporal Controller accordingly:

More

I hope you enjoyed this quick dive into MobilityDB. For more details, including talks by the project founders, check out the project website.


This post is part of a series. Read more about movement data in GIS.

MF-JSON update & tutorial with official sample

Since last week’s post, I’ve learned that there is an official OGC Moving Features JSON Encodings repository with more recent sample datasets, including MovingPoint, MovingPolygon, and Trajectory JSON examples.

The MovingPoint example seems to describe a storm, including its path (temporalGeometry), pressure, wind strength, and class values (temporalProperties):

You can give the current implementation a spin using this MyBinder notebook

An exciting future step would be to experiment with extending MovingPandas to support the MovingPolygon MF-JSON examples. MovingPolygons can change their size and orientation as they move. I’m not yet sure, however, if the number of polygon nodes can change between time steps and how this would be reflected by the prism concept presented in the draft specification:

Image source: https://ksookim.github.io/mf-json/

Geospatial: where MovingPandas meets Leafmap

Many of you certainly have already heard of and/or even used Leafmap by Qiusheng Wu.

Leafmap is a Python package for interactive spatial analysis with minimal coding in Jupyter environments. It provides interactive maps based on folium and ipyleaflet, spatial analysis functions using WhiteboxTools and whiteboxgui, and additional GUI elements based on ipywidgets.

This way, Leafmap achieves a look and feel that is reminiscent of a desktop GIS:

Image source: https://github.com/giswqs/leafmap

Recently, Qiusheng has started an additional project: the geospatial meta package which brings together a variety of different Python packages for geospatial analysis. As such, the main goals of geospatial are to make it easier to discover and use the diverse packages that make up the spatial Python ecosystem.

Besides the usual suspects, such as GeoPandas and of course Leafmap, one of the packages included in geospatial is MovingPandas. Thanks, Qiusheng!

I’ve tested the mamba install today and am very happy with how this worked out. There is just one small hiccup currently, which is related to an upstream jinja2 issue. After installing geospatial, I therefore downgraded jinja:

mamba install -c conda-forge geospatial 
mamba install -c conda-forge jinja2=3.0

Of course, I had to try Leafmap and MovingPandas in action together. Therefore, I fired up one of the MovingPandas example notebook (here the example on clipping trajectories using polygons). As you can see, the integration is pretty smooth since Leafmap already support drawing GeoPandas GeoDataFrames and MovingPandas can convert trajectories to GeoDataFrames (both lines and points):

Clipped trajectory segments as linestrings in Leafmap
Leafmap includes an attribute table view that can be activated on user request to show, e.g. trajectory information
And, of course, we can also map the original trajectory points

Geospatial also includes the new dask-geopandas library which I’m very much looking forward to trying out next.

MovingPandas now supports local coordinates

MovingPandas 0.9rc3 has just been released, including important fixes for local coordinate support. Sports analytics is just one example of movement data analysis that uses local rather than geographic coordinates.

Many movement data sources – such as soccer players’ movements extracted from video footage – use local reference systems. This means that x and y represent positions within an arbitrary frame, such as a soccer field.

Since Geopandas and GeoViews support handling and plotting local coordinates just fine, there is nothing stopping us from applying all MovingPandas functionality to this data. For example, to visualize the movement speed of players:

Of course, we can also plot other trajectory attributes, such as the team affiliation.

But one particularly useful feature is the ability to use custom background images, for example, to show the soccer field layout:

To access the full example notebook, visit: https://github.com/anitagraser/movingpandas/blob/master/tutorials/5-local-coordinates.ipynb

An update to the MovingPandas examples repository will follow shortly.

Exploring ZAMG’s new open weather data

The Central Institution for Meteorology and Geodynamics (ZAMG) is Austrian’s meteorological and geophysical service. And as such, they have a large database of historical weather data which they have now made publicly available, as announced on 28th Oct 2021:

The new ZAMG Data Hub provides weather and station data, mainly in NetCDF and CSV formats:

I decided to grab a NetCDF sample from their analysis and nowcasting system INCA. I went with all available parameters for a period of one day (the data has a temporal resolution of one hour) and a bounding box around Vienna:

https://frontend.hub.zamg.ac.at/grid/d512d5b5-4e9f-4954-98b9-806acbf754f6/historical/form?anonymous=true

The loading screen of QGIS 3.22 shows the different NetCDF layers:

After adding the incal-hourly layer to QGIS, the layer styling panel provides access to the different weather parameters. We can switch between these parameters by clicking the gradient icon next to the parameter names. Here you can see the air temperature:

And because the NetCDF layer is time-aware, we can also use the QGIS Temporal Controller to step through the hourly measurements / create an animation:

Make sure to grab the latest version of QGIS to get access to all the functionality shown here.

Movement data in GIS #34: a protocol for exploring movement data

After writing “Towards a template for exploring movement data” last year, I spent a lot of time thinking about how to develop a solid approach for movement data exploration that would help analysts and scientists to better understand their datasets. Finally, my search led me to the excellent paper “A protocol for data exploration to avoid common statistical problems” by Zuur et al. (2010). What they had done for the analysis of common ecological datasets was very close to what I was trying to achieve for movement data. I followed Zuur et al.’s approach of a exploratory data analysis (EDA) protocol and combined it with a typology of movement data quality problems building on Andrienko et al. (2016). Finally, I brought it all together in a Jupyter notebook implementation which you can now find on Github.

There are two options for running the notebook:

  1. The repo contains a Dockerfile you can use to spin up a container including all necessary datasets and a fitting Python environment.
  2. Alternatively, you can download the datasets manually and set up the Python environment using the provided environment.yml file.

The dataset contains over 10 million location records. Most visualizations are based on Holoviz Datashader with a sprinkling of MovingPandas for visualizing individual trajectories.

Point density map of 10 million location records, visualized using Datashader

Line density map for detecting gaps in tracks, visualized using Datashader

Example trajectory with strong jitter, visualized using MovingPandas & GeoViews

 

I hope this reference implementation will provide a starting point for many others who are working with movement data and who want to structure their data exploration workflow.

If you want to dive deeper, here’s the paper:

[1] Graser, A. (2021). An exploratory data analysis protocol for identifying problems in continuous movement data. Journal of Location Based Services. doi:10.1080/17489725.2021.1900612.

(If you don’t have institutional access to the journal, the publisher provides 50 free copies using this link. Once those are used up, just leave a comment below and I can email you a copy.)

References


This post is part of a series. Read more about movement data in GIS.

Extracting trajectory-based flows between M³ prototypes

Rendering large sets of trajectory lines gets messy fast. Different aggregation approaches have been developed to address this issue. However, most approaches, such as mobility graphs or generalized flow maps, cannot handle large input datasets. Building on M³ prototypes, the following approach can be used in distributed computing environments to extracts flows from large datasets. 

This is part 3 of “Exploring massive movement datasets”.

This flow extraction is based on a two-step process, conceptually similar to Andrienko flow maps: first, we extract M³ prototypes from the movement data. In the second step, we determine flows between these prototypes, including information about: distribution of travel speeds and number of observed transitions. The resulting flows can be visualized, for example, to explore the popularity of different paths of movement:

After the prototypes have been computed, the flow algorithm computes transitions between pairs of prototypes. An object moving from prototype A to prototype B triggers an update of the corresponding flow. To allow for distributed processing, each node in the distributed computing environment needs a copy of the previously computed prototypes. Additionally, the raw movement data records need to be converted into trajectories. Afterwards, each trajectory is processed independently, going through its records in chronological order:

  1. Find the best matching prototype for the current record
  2. Ensure that the distance to the match is below the distance threshold and that the matched prototype is different from the previous prototype
  3. Get or create the flow between the two prototypes
  4. Ensure that the prototype and flow directions are a good match for the current record’s direction
  5. Update the flow properties: travel speed and number of transitions, as well as the previous prototype reference

This approach scales to large datasets since only the prototypes, the (intermediate) flow results, and the trajectory currently being worked on have to be kept in memory for each iteration. However, this algorithm does not allow for continuous updates. Flows would have to be recomputed (at least locally) whenever prototypes changed. Therefore, the algorithm does not support exploration of continuous data streams. However, it can be used to explore large historical datasets:

Flow example: passenger vessel speed patterns showing mean flow speeds (line color: darker colors equal higher speeds) and speed variation (line width)

If you want to dive deeper, here’s the full paper:

[1] Graser, A., Widhalm, P., & Dragaschnig, M. (2020). Extracting Patterns from Large Movement Datasets. GI_Forum – Journal of Geographic Information Science, 1-2020, 153-163. doi:10.1553/giscience2020_01_s153.


This post is part of a series. Read more about movement data in GIS.

Spatial on air #2: spatiotemporal everything!

We’ve done it again!

This time, Daniel O’Donohue and I talked about spatiotemporal data in GIS, including – of course – Time Manager, the new QGIS temporal support, and MovingPandas.

 

Since we need both data and tools to do spatiotemporal analysis, we also talked about file formats and data models. If you want to know more about data models for spatiotemporal (especially movement) data, have a look at the latest discussion paper I wrote together with Esteban Zimányi (MobilityDB) and Krishna Chaitanya Bommakanti (mobilitydb-sqlalchemy):

Data model of the Moving Features standard illustrated with two moving points A and B. Stars mark changes in attribute values. (Source: Graser et al. (2020))

For more details and all options for listening to this podcast, visit mapscaping.com.

 

M³ Massive Movement Model: aggregating movement data using prototypes

Visualizations of raw movement data records, that is, simple point maps or point density (“heat”) maps provide very limited data exploration capabilities. Therefore, we need clever aggregation approaches that can actually reveal movement patterns. Many existing aggregation approaches, however, do not scale to large datasets. We therefore developed the M³ Massive Movement Model [1] which supports distributed computing environments and can be incrementally updated with new data.

This is part 1 of “Exploring massive movement datasets”.

Using state-of-the-art big gespatial tools, such as GeoMesa, it is quite straightforward to ingest, index and query large amounts of timestamped location records. Thanks to GeoMesa’s GeoServer integration, it is also possible to publish GeoMesa tables as WMS and WFS which can be visualized in QGIS and explored (for more about GeoMesa, see Scalable spatial vector data processing ).So far so good! But with this basic setup, we only get point maps and point density maps which don’t tell us much about important movement characteristics like speed and direction (particularly if the reporting interval between consecutive location records is irregular). Therefore, we developed an aggregation method which models local record density, as well as movement speed and direction which we call M³.

For distributed computation, we need to split large datasets into chunks. To build models of local movement characteristics, it makes sense to create spatial or spatiotemporal chunks that can be processed independently. We therefore split the data along a regular grid but instead of computing one average value per grid cell, we create a flexible number of prototypes that describe the movement in the cell. Each prototype models a location, speed, and direction distribution (mean and sigma).

In our paper, we used M³ to explore ship movement data. We turned roughly 4 billion AIS records into prototypes:

M³ for ship movement data during January to December 2017 (3.9 billion records turned into 3.4 million prototypes; computing time: 41 minutes)

The above plot really only gives a first impression of the spatial distribution of ship movement records. The real value of M³ becomes clearer when we zoom in and start exploring regional patterns. Then we can discover vessel routes, speeds, and movement directions:

The prototype details on the right side, in particular, show the strength of the prototype idea: even though the grid cells we use are rather large, the prototypes clearly form along vessel routes. We can see exactly where these routes are and what speeds ship travel there, without having to increase the grid resolution to impractical values. Slow prototypes with high direction sigma (red+black markers) are clear indicators of ports. The marker size shows the number of records per prototype and thus helps distinguish heavily traveled routes from minor ones.

M³ is implemented in Spark. We read raw location records from GeoMesa and write prototypes to GeoMesa. All maps have been created in QGIS using prototype data published as GeoServer WFS.

If you want to dive deeper, here’s the full paper:

[1] Graser. A., Widhalm, P., & Dragaschnig, M. (2020). The M³ massive movement model: a distributed incrementally updatable solution for big movement data exploration. International Journal of Geographical Information Science. doi:10.1080/13658816.2020.1776293.


This post is part of a series. Read more about movement data in GIS.

Generating trajectories from massive movement datasets

To explore travel patterns like origin-destination relationships, we need to identify individual trips with their start/end locations and trajectories between them. Extracting these trajectories from large datasets can be challenging, particularly if the records of individual moving objects don’t fit into memory anymore and if the spatial and temporal extent varies widely (as is the case with ship data, where individual vessel journeys can take weeks while crossing multiple oceans). 

This is part 2 of “Exploring massive movement datasets”.

Roughly speaking, trip trajectories can be generated by first connecting consecutive records into continuous tracks and then splitting them at stops. This general approach applies to many different movement datasets. However, the processing details (e.g. stop detection parameters) and preprocessing steps (e.g. removing outliers) vary depending on input dataset characteristics.

For example, in our paper [1], we extracted vessel journeys from AIS data which meant that we also had to account for observation gaps when ships leave the observable (usually coastal) areas. In the accompanying 10-minute talk, I went through a 4-step trajectory exploration workflow for assessing our dataset’s potential for travel time prediction:

Click to watch the recorded talk

Like the M³ prototype computation presented in part 1, our trajectory aggregation approach is implemented in Spark. The challenges are both the massive amounts of trajectory data and the fact that operations only produce correct results if applied to a complete and chronologically sorted set of location records.This is challenging because Spark core libraries (version 2.4.5 at the time) are mostly geared towards dealing with unsorted data. This means that, when using high-level Spark core functionality incorrectly, an aggregator needs to collect and sort the entire track in the main memory of a single processing node. Consequently, when dealing with large datasets, out-of-memory errors are frequently encountered.

To solve this challenge, our implementation is based on the Secondary Sort pattern and on Spark’s aggregator concept. Secondary Sort takes care to first group records by a key (e.g. the moving object id), and only in the second step, when iterating over the records of a group, the records are sorted (e.g. chronologically). The resulting iterator can be used by an aggregator that implements the logic required to build trajectories based on gaps and stops detected in the dataset.

If you want to dive deeper, here’s the full paper:

[1] Graser, A., Dragaschnig, M., Widhalm, P., Koller, H., & Brändle, N. (2020). Exploratory Trajectory Analysis for Massive Historical AIS Datasets. In: 21st IEEE International Conference on Mobile Data Management (MDM) 2020. doi:10.1109/MDM48529.2020.00059


This post is part of a series. Read more about movement data in GIS.

Movement data in GIS #31: exploring massive movement datasets

Exploring large movement datasets is hard because visualizations of movement data quickly get cluttered and hard to interpret. Therefore, we need to aggregate the data. Density maps are commonly used since they are readily available and quick to compute but they provide only very limited insight. In contrast, meaningful aggregations that can help discover patterns are computationally expensive and therefore slow to generate.

This post serves as a starting point for a series of new approaches to exploring massive movement data. This series will summarize parts of my PhD research and – for those of you who are interested in more details – there will be links to the relevant papers.

Starting with the raw location records, we use different forms of aggregation to learn more about what information a movement dataset contains:

  1. Summarizing movement using prototypes by aggregating raw location records using our flexible M³ Massive Movement Model [1]
  2. Generating trajectories by connecting consecutive records into continuous tracks and splitting them into meaningful trajectories [2]
  3. Extracting flows by summarizing trajectory-based transitions between prototypes [3]

Besides clever aggregation approaches, massive movement datasets also require appropriate computing resources. To ensure that we can efficiently explore large datasets, we have implemented the above mentioned aggregation steps in Spark. This enables us to run the computations on general purpose computing clusters that can be scaled according to the dataset size.

In the next post, we’ll look at how to summarize movement using M³ prototypes. So stay tuned!

But if you don’t want to wait, these are the original papers:

[1] Graser. A., Widhalm, P., & Dragaschnig, M. (2020). The M³ massive movement model: a distributed incrementally updatable solution for big movement data exploration. International Journal of Geographical Information Science. doi:10.1080/13658816.2020.1776293.
[2] Graser, A., Dragaschnig, M., Widhalm, P., Koller, H., & Brändle, N. (2020). Exploratory Trajectory Analysis for Massive Historical AIS Datasets. In: 21st IEEE International Conference on Mobile Data Management (MDM) 2020. doi:10.1109/MDM48529.2020.00059
[3] Graser, A., Widhalm, P., & Dragaschnig, M. (2020). Extracting Patterns from Large Movement Datasets. GI_Forum – Journal of Geographic Information Science, 1-2020, 153-163. doi:10.1553/giscience2020_01_s153.


This post is part of a series. Read more about movement data in GIS.

Movement data in GIS #30: synchronized trajectory animations with QGIS temporal controller

QGIS Temporal Controller is a powerful successor of TimeManager. Temporal Controller is a new core feature of the current development version and will be shipped with the 3.14 release. This post demonstrates two key advantages of this new temporal support:

  1. Expression support for defining start and end timestamps
  2. Integration into the PyQGIS API

These features come in very handy in many use cases. For example, they make it much easier to create animations from folders full of GPS tracks since the files can now be loaded and configured automatically:

Script & Temporal Controller in action (click for full resolution)

All tracks start at the same location but at different times. (Kudos for Andrew Fletcher for recordings these tracks and sharing them with me!) To create an animation that shows all tracks start simultaneously, we need to synchronize them. This synchronization can be achieved on-the-fly by subtracting the start time from all track timestamps using an expression:

directory = "E:/Google Drive/QGIS_Course/05_TimeManager/Example_Dayrides/"

def load_and_configure(filename):
    path = os.path.join(directory, filename)
    uri = 'file:///' + path + "?type=csv&escape=&useHeader=No&detectTypes=yes"
    uri = uri + "&crs=EPSG:4326&xField=field_3&yField=field_2"
    vlayer = QgsVectorLayer(uri, filename, "delimitedtext")
    QgsProject.instance().addMapLayer(vlayer)

    mode = QgsVectorLayerTemporalProperties.ModeFeatureDateTimeStartAndEndFromExpressions
    expression = """to_datetime(field_1) -
    make_interval(seconds:=minimum(epoch(to_datetime("field_1")))/1000)
    """

    tprops = vlayer.temporalProperties()
    tprops.setStartExpression(expression)
    tprops.setEndExpression(expression) # optional
    tprops.setMode(mode)
    tprops.setIsActive(True)

for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        load_and_configure(filename)

The above script loads all CSV files from the given directory (field_1 is the timestamp, field_2 is y, and field_3 is x), enables sets the start and end expression as well as the corresponding temporal control mode and finally activates temporal rendering. The resulting config can be verified in the layer properties dialog:

To adapt this script to other datasets, it’s sufficient to change the file directory and revisit the layer uri definition as well as the field names referenced in the expression.


This post is part of a series. Read more about movement data in GIS.

TimeManager is dead, long live the Temporal Controller!

TimeManager turns 10 this year. The code base has made the transition from QGIS 1.x to 2.x and now 3.x and it would be wrong to say that it doesn’t show ;-)

Now, it looks like the days of TimeManager are numbered. Four days ago, Nyall Dawson has added native temporal support for vector layers to QGIS. This is part of a larger effort of adding time support for rasters, meshes, and now also vectors.

The new Temporal Controller panel looks similar to TimeManager. Layers are configured through the new Temporal tab in Layer Properties. The temporal dimension can be used in expressions to create fancy time-dependent styles:

temporal1

TimeManager Geolife demo converted to Temporal Controller (click for full resolution)

Obviously, this feature is brand new and will require polishing. Known issues listed by Nyall include limitations of supported time fields (only fields with datetime type are supported right now, strings cannot be used) and worse performance than TimeManager since features are filtered in QGIS rather than in the backend.

If you want to give the new Temporal Controller a try, you need to install the current development version, e.g. qgis-dev in OSGeo4W.


Update from May 16:

Many of the limitations above have already been addressed.

Last night, Nyall has recorded a one hour tutorial on this new feature, enjoy:

Super-quick interactive data & parameter exploration

This post introduces Holoviz Panel, a library that makes it possible to create really quick dashboards in notebook environments as well as more sophisticated custom interactive web apps and dashboards.

The following example shows how to use Panel to explore a dataset (a trajectory collection in this case) and different parameter settings (relating to trajectory generalization). All the Panel code we need is a dict that defines the parameters that we want to explore. Then we can use Panel’s interact function to automatically generate a dashboard for our custom plotting function:

import panel as pn

kw = dict(traj_id=(1, len(traj_collection)), 
          tolerance=(10, 100, 10), 
          generalizer=['douglas-peucker', 'min-distance'])
pn.interact(plot_generalized, **kw)

Click to view the resulting dashboard in full resolution:

The plotting function uses the parameters to generate a Holoviews plot. First it fetches a specific trajectory from the trajectory collection. Then it generalizes the trajectory using the specified parameter settings. As you can see, we can easily combine maps and other plots to visualize different aspects of the data:

def plot_generalized(traj_id=1, tolerance=10, generalizer='douglas-peucker'):
  my_traj = traj_collection.get_trajectory(traj_id).to_crs(CRS(4088))
  if generalizer=='douglas-peucker':
    generalized = mpd.DouglasPeuckerGeneralizer(my_traj).generalize(tolerance)
  else:
    generalized = mpd.MinDistanceGeneralizer(my_traj).generalize(tolerance)
  generalized.add_speed(overwrite=True)
  return ( 
    generalized.hvplot(
      title='Trajectory {} (tolerance={})'.format(my_traj.id, tolerance), 
      c='speed', cmap='Viridis', colorbar=True, clim=(0,20), 
      line_width=10, width=500, height=500) + 
    generalized.df['speed'].hvplot.hist(
      title='Speed histogram', width=300, height=500) 
    )

Trajectory collections and generalization functions used in this example are part of the MovingPandas library. If you are interested in movement data analysis, you should check it out! You can find this example notebook in the MovingPandas tutorial section.

Movement data in GIS #28: open geospatial tools for movement data exploration

We recently published a new paper on “Open Geospatial Tools for Movement Data Exploration” (open access). If you liked Movement data in GIS #26: towards a template for exploring movement data, you will find even more information about the context, challenges, and recent developments in this paper.

It also presents three open source stacks for movement data exploration:

  1. QGIS + PostGIS: a combination that will be familiar to most open source GIS users
  2. Jupyter + MovingPandas: less common so far, but Jupyter notebooks are quickly gaining popularity (even in the proprietary GIS world)
  3. GeoMesa + Spark: for when datasets become too big to handle using other means

and discusses their capabilities and limitations:


This post is part of a series. Read more about movement data in GIS.

First working MovingPandas setup on Databricks

In December, I wrote about GeoPandas on Databricks. Back then, I also tried to get MovingPandas working but without luck. (While GeoPandas can be installed using Databricks’ dbutils.library.installPyPI("geopandas") this PyPI install just didn’t want to work for MovingPandas.)

Now that MovingPandas is available from conda-forge, I gave it another try and … *spoiler alert* … it works!

First of all, conda support on Databricks is in beta. It’s not included in the default runtimes. At the time of writing this post, “6.0 Conda Beta” is the latest runtime with conda:

Once the cluster is up and connected to the notebook, a quick conda list shows the installed packages:

Time to install MovingPandas! I went with a 100% conda-forge installation. This takes a looong time (almost half an hour)!

When the installs are finally done, it get’s serious: time to test the imports!

Success!

Now we can put the MovingPandas data structures to good use. But first we need to load some movement data:

Or course, the points in this GeoDataFrame can be plotted. However, the plot isn’t automatically displayed once plot() is called on the GeoDataFrame. Instead, Databricks provides a display() function to display Matplotlib figures:

MovingPandas also uses Matplotlib. Therefore we can use the same approach to plot the TrajectoryCollection that can be created from the GeoDataFrame:

These Matplotlib plots are nice and quick but they lack interactivity and therefore are of limited use for data exploration.

MovingPandas provides interactive plotting (including base maps) using hvplot. hvplot is based on Bokeh and, luckily, the Databricks documentation tells us that bokeh plots can be exported to html and then displayed using  displayHTML():

Of course, we could achieve all this on MyBinder as well (and much more quickly). However, Databricks gets interesting once we can add (Py)Spark and distributed processing to the mix. For example, “Getting started with PySpark & GeoPandas on Databricks” shows a spatial join function that adds polygon information to a point GeoDataFrame.

A potential use case for MovingPandas would be to speed up flow map computations. The recently added aggregator functionality (currently in master only) first computes clusters of significant trajectory points and then aggregates the trajectories into flows between these clusters. Matching trajectory points to the closest cluster could be a potential use case for distributed computing. Each trajectory (or each point) can be handled independently, only the cluster locations have to be broadcast to all workers.

Flow map (screenshot from MovingPandas tutorial 4_generalization_and_aggregation.ipynb)

 

  • Page 1 of 3 ( 53 posts )
  • >>
  • spatio-temporal data

Back to Top

Sustaining Members