Related Plugins and Tags

QGIS Planet

Building an interactive app with geocoding in Jupyter Lab

This post aims to show you how to create quick interactive apps for prototyping and data exploration using Panel.

Specifically, the following example demos how to add geocoding functionality based on Geopy and Nominatim. As such, this example brings together tools we’ve previously touched on in Super-quick interactive data & parameter exploration and Geocoding with Geopy.

Here’s a quick preview of the resulting app in action:

To create this app, I defined a single function called my_plot which takes the address and desired buffer size as input parameters. Using Panel’s interact and servable methods, I’m then turning this function into the interactive app you’ve seen above:

import panel as pn
from geopy.geocoders import Nominatim
from utils.converting import location_to_gdf
from utils.plotting import hvplot_with_buffer

locator = Nominatim(user_agent="OGD.AT-Lab")

def my_plot(user_input="Giefinggasse 2, 1210 Wien", buffer_meters=1000):
    location = locator.geocode(user_input)
    geocoded_gdf = location_to_gdf(location, user_input)
    map_plot = hvplot_with_buffer(geocoded_gdf, buffer_meters, 
                                  title=f'Geocoded address with {buffer_meters}m buffer')
    return map_plot.opts(active_tools=['wheel_zoom']) 

kw = dict(user_input="Giefinggasse 2, 1210 Wien", buffer_meters=(0,10000))

pn.template.FastListTemplate(
    site="Panel", title="Geocoding Demo", 
    main=[pn.interact(my_plot, **kw)]
).servable();

You can find the full notebook in the OGD.AT Lab repository or run this notebook directly on MyBinder:

To open the Panel preview, press the green Panel button in the Jupyter Lab toolbar:

I really enjoy building spatial data exploration apps this way, because I can start off with a Jupyter notebook and – once I’m happy with the functionality – turn it into a pretty app that provides a user-friendly exterior and hides the underlying complexity that might scare away stakeholders.

Give it a try and share your own adventures. I’d love to see what you come up with.

Dynamic Infographic Map Tutorial

This is a guest post by Mickael HOARAU @Oneil974

As an update of the tutorial from previous years, I created a tutorial showing how to make a simple and dynamic color map with charts in QGIS.

In this tutorial you can see some of interesting features of QGIS and its community plugins. Here you’ll see variables, expressions, filters, QuickOSM and DataPlotly plugins and much more. You just need to use QGIS 3.24 Tisler version.

Here is the tutorial.

New OGC Moving Features JSON support in MovingPandas

First time, we talked about the OGC Moving Features standard in a post from 2017. Back then, we looked at the proposed standard way to encode trajectories in CSV and discussed its issues. Since then, the Moving Features working group at OGC has not been idle. Besides the CSV and XML encodings, they have designed a new JSON encoding that addresses many of the downsides of the previous two. You can read more about this in our 2020 preprint “From Simple Features to Moving Features and Beyond”.

Basically Moving Features JSON (MF-JSON) is heavily inspired by GeoJSON and it comes with a bunch of mandatory and optional key/value pairs. There is support for static properties as well as dynamic temporal properties and, of course, temporal geometries (yes geometries, not just points).

I think this format may have an actual chance of gaining more widespread adoption.

Image source: http://www.opengis.net/doc/BP/mf-json/1.0

Inspired by Pandas.read_csv() and GeoPandas.read_file(), I’ve started implementing a read_mf_json() function in MovingPandas. So far, it supports basic MovingFeature JSONs with MovingPoint geometry:

You’ll need to use the current development version to test this feature.

Next steps will be MovingFeatureCollection JSONs and support for static as well as temporal properties. We’ll have to see if MovingPandas can be extended to go beyond moving point geometries. Storing moving linestrings and polygons in the GeoDataFrame will be the simple part but analytics and visualization will certainly be more tricky.


This post is part of a series. Read more about movement data in GIS.

2021 wrap-up: baba und foi net!

What a ride.

This year has been both extremely rewarding and incredibly frustrating, sometimes both in very short succession.

I’ve finally finished my PhD dissertation and – between movement data analysis and open source and open data science talks in general – I’ve been counting over ten invited talks and conference presentations, including keynotes at FOSS4G and GI_Forum. Unfortunately, all of these were limited to virtual experiences and therefore often lacked much of the social interaction off stage that usually makes giving talks rewarding. But FOSS4G2021 was a refreshing exception to this rule:

One of the rare in-person events I attended this year was the Futurezone 2020 Award ceremony which – of course – had been postponed to 2021. The award took me completely by surprise:

This year’s focus on talks meant that there haven’t been many blog posts with original content this year, an unfortunate situation I hope to improve in 2022.

Without wanting to promise to much, there are quite a few interesting MovingPandas collaborations in the works that will hopefully result in exciting new features, demos, and tutorials:

On the plus side, with so many virtual events – from conferences to community events such as QGIS Open Days – much of the content formerly exclusively available to participants on-site have been recorded. Some worthwhile accounts and playlists include:

Happy streaming, happy new year, and – if you can – get vaccinated!

I’ll be having a nerd party.

New MovingPandas website

The last couple of days, I have been hacking away to improve the online presence of MovingPandas.

The new home page aims to be the central landing page that provides direct links to all important resources: from source code on Github, to documentation on ReadTheDocs, and – most importantly – all the tutorial and analysis example notebooks:

Additionally, all tutorial and analysis example notebooks now contain direct links to live versions on MyBinder, sources on Github and already executed pre-rendered HTML versions of the notebooks for quick browsing:

If you are using MovingPandas, I’d love to hear about it, particularly if you want to share one of your analysis examples with the community.

Open source for open spatial data science

Thanks to the FOSS4G2021 video team, all talks including my keynote are now available online.

I had the honor to be invited to give the closing keynote, talking about how open source can help open science, particularly data science:

I’m convinced that efforts towards more open data science are a worthwhile investment even if current scientific incentive structures are stacked against it.

Until incentive policies catch up, we all can help encourage more people to go the extra mile(s) by properly valuing their efforts, e.g. by celebrating and citing reproducible publications, open research datasets, and open scientific software.

Exploring ZAMG’s new open weather data

The Central Institution for Meteorology and Geodynamics (ZAMG) is Austrian’s meteorological and geophysical service. And as such, they have a large database of historical weather data which they have now made publicly available, as announced on 28th Oct 2021:

The new ZAMG Data Hub provides weather and station data, mainly in NetCDF and CSV formats:

I decided to grab a NetCDF sample from their analysis and nowcasting system INCA. I went with all available parameters for a period of one day (the data has a temporal resolution of one hour) and a bounding box around Vienna:

https://frontend.hub.zamg.ac.at/grid/d512d5b5-4e9f-4954-98b9-806acbf754f6/historical/form?anonymous=true

The loading screen of QGIS 3.22 shows the different NetCDF layers:

After adding the incal-hourly layer to QGIS, the layer styling panel provides access to the different weather parameters. We can switch between these parameters by clicking the gradient icon next to the parameter names. Here you can see the air temperature:

And because the NetCDF layer is time-aware, we can also use the QGIS Temporal Controller to step through the hourly measurements / create an animation:

Make sure to grab the latest version of QGIS to get access to all the functionality shown here.

Movement data in GIS #36: trucks from space

Can we reliably measure truck traffic from space? Compared to private transport, spatiotemporal data on freight transport is even harder to come by. Detecting trucks using remote sensing has been a promising lead for many years but often required access to pretty specialized sensors, such as TerraSAR-X. That is why I was really excited to read about a new approach that detects trucks in commonly available Sentinel-2 imagery developed by Henrik Fisser (Julius-Maximilians-University Würzburg, Germany). So I reached out to him to learn more about the possibilities this new technology opens up. 

Vehicles are visible and detectable in Sentinel-2 data if they are large and moving fast enough (image source: ESA)

To verify his truck detection results. Henrik had already used data from truck counting stations along the German autobahn network. However, these counters are quite rare and thus cannot provide full spatial coverage. Therefore we started looking for more complete reference data. Fortunately, Nikolaus Kapser at the Austrian highway corporation ASFINAG offered his help. The Austrian autobahn toll system is gantry-based. It records when a truck passes a gantry. Using the timestamp of these truck passages and the current traffic speed, it is possible to estimate truck locations at arbitrary points in time, such as the time a Sentinel-2 image was taken. This makes it possible to assess the Sentinel-2-based truck detection along the autobahn network for complete Sentinel-2 images.

Overall, Sentinel-2-based detections tend to underestimate the number of trucks. Henrik found a strong correlation (with an average r value > 0.8) between German traffic counting stations and trucks detected by the Sentinel-2 method. These counting stations were selected for their ideal characteristics, including distance from volatile traffic situations such as a high number of highway intersections. This is very different from our comparison which covers autobahn sections in and near Vienna. We therefore expected larger detection errors. However, our new Austrian analysis reaches similar results (with r values of 0.79, 0.70, and 0.86 for three different days 2020-08-28, 2020-09-22, and 2020-11-06).

Thanks to the truck reference locations provided by ASFINAG, we were also able to analyze the spatial distribution of truck detections. We decided to compare ASFINAG data (truth) and Sentinel-2-based detections using a grid based approach with a cell size of 5×5 km. Confirming Henrik’s original results, grid cells with higher detection than ground truth values are clearly in the minority. Interestingly, many cells in Vienna (at the eastern border of the image extent) exhibit rather low relative errors compared to, for example, the cells along Westautobahn (the east-west running autobahn in the center of the image extent).

Some important remarks: The Sentinel-2-based detection method only works for large vehicles moving around 50km/h or faster. It is hence less suited to detect trucks in city traffic. Additionally, trucks in tunnel sections cannot be detected. To enable a fair comparison, we therefore flagged trucks in the ground truth dataset that were located in tunnels and excluded them from the analysis. Sentinel-2 captures the region around Vienna around 10:00 o’clock in the morning. As a result, it is not possible to assess other times of day. Finally, cloud cover will reduce the accuracy. Therefore we picked images with low reported cloud cover percentage (< 5%).

It is really exciting to finally see a truck detection method that works with readily available remote sensing data because this means that it is potentially transferable to other areas of the world where no official traffic counts are available. Furthermore, this method should be in line with data protection regulations (avoiding identification of individuals and potential reconstruction of movement trajectories) thus making it possible to use and publish the resulting data without further anonymization steps.


This post was written in collaboration with Henrik Fisser (Uni Würzburg / DLR) and Nikolaus Kasper (Asfinag MSG). Keep your eyes open for upcoming detailed publications on the Sentinel-2-based method by Henrik.


This post is part of a series. Read more about movement data in GIS.

Video recommendations from FOSDEM 2021

The Geospatial Dev Room at FOSDEM 2021 was a great event that (virtually) brought together a very diverse group of geo people.

All talk recordings are now available publicly at: fosdem.org/2021/schedule/track/geospatial

In line with the main themes of this blog, I’d particularly like to highlight the following three talks:

MoveTK: the movement toolkit A library for understanding movement by Aniket Mitra

Telegram Bot For Navigation: A perfect map app for a neighbourhood doesn’t need a map by Ilya Zverev

Spatial data exploration in Jupyter notebooks by yours truly

Introducing the open data analysis OGD.AT Lab

Data sourcing and preparation is one of the most time consuming tasks in many spatial analyses. Even though the Austrian data.gv.at platform already provides a central catalog, the individual datasets still vary considerably in their accessibility or readiness for use.

OGD.AT Lab is a new repository collecting Jupyter notebooks for working with Austrian Open Government Data and other auxiliary open data sources. The notebooks illustrate different use cases, including so far:

  1. Accessing geodata from the city of Vienna WFS
  2. Downloading environmental data (heat vulnerability and air quality)
  3. Geocoding addresses and getting elevation information
  4. Exploring urban movement data

Data processing and visualization are performed using Pandas, GeoPandas, and Holoviews. GeoPandas makes it straighforward to use data from WFS. Therefore, OGD.AT Lab can provide one universal gdf_from_wfs() function which takes the desired WFS layer as an argument and returns a GeoPandas.GeoDataFrame that is ready for analysis:

Many other datasets are provided as CSV files which need to be joined with spatial datasets to use them in spatial analysis. For example, the “Urban heat vulnerability index” dataset which needs to be joined to statistical areas.

 

Another issue with many CSV files is that they use German number formatting, where commas are used as a decimal separater instead of dots:

Besides file access, there are also open services provided by other developers, for example, Manfred Egger developed an elevation service that provides elevation information for any point in Austria. In combination with geocoding services, such as Nominatim, this makes is possible to, for example, find the elevation for any address in Austria:

Last but not least, the first version of the mobility notebook showcases open travel time data provided by Uber Movement:

The utility functions for data access included in this repository will continue to grow as new data sources are included. Eventually, it may make sense to extract the data access function into a dedicated library, similar to geofi (Finland) or geobr (Brazil).

If you’re aware of any interesting open datasets or services that should be included in OGD.AT, feel free to reach out here or on Github through the issue tracker or by providing a pull request.

Sol Katz Award – Thank you

On Thursday, I was awarded the 2020 Sol Katz award for Free and Open Source Software for Geospatial. I feel very honored to have been selected for this award and I’d like to take this opportunity to share a few words of thanks:

As people working in open source projects, we are constantly reminded that we are all standing on the shoulders of giants. However, particularly this year, we also see just how important small personal connections are. For me, my involvement with open source communities really took off when I joined the QGIS hackfest in Vienna in 2009 and I felt that my participation was really appreciated and welcome. I couldn’t imagine being without these connections anymore.

Thank you to the whole QGIS community, particularly my fellow PSC members both current and former: Tim, Andreas, Jürgen, Richard, Paolo, Otto, Marco Hugentobler, Alessandro, our new chair Marco Bernasocchi, and of course QGIS founder Gary Sherman for starting this awesome project and for still being around and actively promoting geospatial open source by publishing so many great books covering multiple different OSGeo projects.

I’d also like to thank my partner and my family for being incredibly understanding whenever I’m spend my time geeking out over a new programming project, data analysis, forum question, or conference talk.

Thank you also to my friends, colleagues and fellow members of the larger OSGeo community for sharing ideas, providing valuable feedback, and spreading the word about all the great work that’s going on all around us.

I’m constantly amazed by all the innovation happening to nourish and grow our community. And I’m looking forward to continue being a part of these efforts.

Thank you!

 

Extracting trajectory-based flows between M³ prototypes

Rendering large sets of trajectory lines gets messy fast. Different aggregation approaches have been developed to address this issue. However, most approaches, such as mobility graphs or generalized flow maps, cannot handle large input datasets. Building on M³ prototypes, the following approach can be used in distributed computing environments to extracts flows from large datasets. 

This is part 3 of “Exploring massive movement datasets”.

This flow extraction is based on a two-step process, conceptually similar to Andrienko flow maps: first, we extract M³ prototypes from the movement data. In the second step, we determine flows between these prototypes, including information about: distribution of travel speeds and number of observed transitions. The resulting flows can be visualized, for example, to explore the popularity of different paths of movement:

After the prototypes have been computed, the flow algorithm computes transitions between pairs of prototypes. An object moving from prototype A to prototype B triggers an update of the corresponding flow. To allow for distributed processing, each node in the distributed computing environment needs a copy of the previously computed prototypes. Additionally, the raw movement data records need to be converted into trajectories. Afterwards, each trajectory is processed independently, going through its records in chronological order:

  1. Find the best matching prototype for the current record
  2. Ensure that the distance to the match is below the distance threshold and that the matched prototype is different from the previous prototype
  3. Get or create the flow between the two prototypes
  4. Ensure that the prototype and flow directions are a good match for the current record’s direction
  5. Update the flow properties: travel speed and number of transitions, as well as the previous prototype reference

This approach scales to large datasets since only the prototypes, the (intermediate) flow results, and the trajectory currently being worked on have to be kept in memory for each iteration. However, this algorithm does not allow for continuous updates. Flows would have to be recomputed (at least locally) whenever prototypes changed. Therefore, the algorithm does not support exploration of continuous data streams. However, it can be used to explore large historical datasets:

Flow example: passenger vessel speed patterns showing mean flow speeds (line color: darker colors equal higher speeds) and speed variation (line width)

If you want to dive deeper, here’s the full paper:

[1] Graser, A., Widhalm, P., & Dragaschnig, M. (2020). Extracting Patterns from Large Movement Datasets. GI_Forum – Journal of Geographic Information Science, 1-2020, 153-163. doi:10.1553/giscience2020_01_s153.


This post is part of a series. Read more about movement data in GIS.

Generating trajectories from massive movement datasets

To explore travel patterns like origin-destination relationships, we need to identify individual trips with their start/end locations and trajectories between them. Extracting these trajectories from large datasets can be challenging, particularly if the records of individual moving objects don’t fit into memory anymore and if the spatial and temporal extent varies widely (as is the case with ship data, where individual vessel journeys can take weeks while crossing multiple oceans). 

This is part 2 of “Exploring massive movement datasets”.

Roughly speaking, trip trajectories can be generated by first connecting consecutive records into continuous tracks and then splitting them at stops. This general approach applies to many different movement datasets. However, the processing details (e.g. stop detection parameters) and preprocessing steps (e.g. removing outliers) vary depending on input dataset characteristics.

For example, in our paper [1], we extracted vessel journeys from AIS data which meant that we also had to account for observation gaps when ships leave the observable (usually coastal) areas. In the accompanying 10-minute talk, I went through a 4-step trajectory exploration workflow for assessing our dataset’s potential for travel time prediction:

Click to watch the recorded talk

Like the M³ prototype computation presented in part 1, our trajectory aggregation approach is implemented in Spark. The challenges are both the massive amounts of trajectory data and the fact that operations only produce correct results if applied to a complete and chronologically sorted set of location records.This is challenging because Spark core libraries (version 2.4.5 at the time) are mostly geared towards dealing with unsorted data. This means that, when using high-level Spark core functionality incorrectly, an aggregator needs to collect and sort the entire track in the main memory of a single processing node. Consequently, when dealing with large datasets, out-of-memory errors are frequently encountered.

To solve this challenge, our implementation is based on the Secondary Sort pattern and on Spark’s aggregator concept. Secondary Sort takes care to first group records by a key (e.g. the moving object id), and only in the second step, when iterating over the records of a group, the records are sorted (e.g. chronologically). The resulting iterator can be used by an aggregator that implements the logic required to build trajectories based on gaps and stops detected in the dataset.

If you want to dive deeper, here’s the full paper:

[1] Graser, A., Dragaschnig, M., Widhalm, P., Koller, H., & Brändle, N. (2020). Exploratory Trajectory Analysis for Massive Historical AIS Datasets. In: 21st IEEE International Conference on Mobile Data Management (MDM) 2020. doi:10.1109/MDM48529.2020.00059


This post is part of a series. Read more about movement data in GIS.

Movement data in GIS #31: exploring massive movement datasets

Exploring large movement datasets is hard because visualizations of movement data quickly get cluttered and hard to interpret. Therefore, we need to aggregate the data. Density maps are commonly used since they are readily available and quick to compute but they provide only very limited insight. In contrast, meaningful aggregations that can help discover patterns are computationally expensive and therefore slow to generate.

This post serves as a starting point for a series of new approaches to exploring massive movement data. This series will summarize parts of my PhD research and – for those of you who are interested in more details – there will be links to the relevant papers.

Starting with the raw location records, we use different forms of aggregation to learn more about what information a movement dataset contains:

  1. Summarizing movement using prototypes by aggregating raw location records using our flexible M³ Massive Movement Model [1]
  2. Generating trajectories by connecting consecutive records into continuous tracks and splitting them into meaningful trajectories [2]
  3. Extracting flows by summarizing trajectory-based transitions between prototypes [3]

Besides clever aggregation approaches, massive movement datasets also require appropriate computing resources. To ensure that we can efficiently explore large datasets, we have implemented the above mentioned aggregation steps in Spark. This enables us to run the computations on general purpose computing clusters that can be scaled according to the dataset size.

In the next post, we’ll look at how to summarize movement using M³ prototypes. So stay tuned!

But if you don’t want to wait, these are the original papers:

[1] Graser. A., Widhalm, P., & Dragaschnig, M. (2020). The M³ massive movement model: a distributed incrementally updatable solution for big movement data exploration. International Journal of Geographical Information Science. doi:10.1080/13658816.2020.1776293.
[2] Graser, A., Dragaschnig, M., Widhalm, P., Koller, H., & Brändle, N. (2020). Exploratory Trajectory Analysis for Massive Historical AIS Datasets. In: 21st IEEE International Conference on Mobile Data Management (MDM) 2020. doi:10.1109/MDM48529.2020.00059
[3] Graser, A., Widhalm, P., & Dragaschnig, M. (2020). Extracting Patterns from Large Movement Datasets. GI_Forum – Journal of Geographic Information Science, 1-2020, 153-163. doi:10.1553/giscience2020_01_s153.


This post is part of a series. Read more about movement data in GIS.

Spatial on air: talking Python on the MapScaping Podcast

Podcasts have become huge. I’m an avid listener of podcasts myself. I particularly enjoy formats that take the time to talk about unconventional topics in detail.

My first podcast experience was on the QGIS podcast hosted by Tim Sutton in 2014. Unfortunately, it seems like the podcast episodes are not online anymore.

Recently, I had the pleasure to join the MapScaping Podcast by Daniel O’Donohue to talk about Python for Geospatial: 

Other guests Daniel has already interviewed include:

Another geospatial podcast I really enjoy is The Mappyist Hour by Silas and Todd. Unfortunately, it’s a bit silent there now but it’s definitely worth to listen into their episode archive. One of my favorites is Episode 9 where Linda Stevens (Hecht) discusses her career at ESRI, the future of GIS, and the role of Open Source Spatial in that future:

If you listen to and want to recommend other spatial podcasts, please share them in the comments!

Movement data in GIS #29: power your web apps with movement data using mobilitydb-sqlalchemy

This is a guest post by Bommakanti Krishna Chaitanya @chaitan94

Introduction

This post introduces mobilitydb-sqlalchemy, a tool I’m developing to make it easier for developers to use movement data in web applications. Many web developers use Object Relational Mappers such as SQLAlchemy to read/write Python objects from/to a database.

Mobilitydb-sqlalchemy integrates the moving objects database MobilityDB into SQLAlchemy and Flask. This is an important step towards dealing with trajectory data using appropriate spatiotemporal data structures rather than plain spatial points or polylines.

To make it even better, mobilitydb-sqlalchemy also supports MovingPandas. This makes it possible to write MovingPandas trajectory objects directly to MobilityDB.

For this post, I have made a demo application which you can find live at https://mobilitydb-sqlalchemy-demo.adonmo.com/. The code for this demo app is open source and available on GitHub. Feel free to explore both the demo app and code!

In the following sections, I will explain the most important parts of this demo app, to show how to use mobilitydb-sqlalchemy in your own webapp. If you want to reproduce this demo, you can clone the demo repository and do a “docker-compose up –build” as it automatically sets up this docker image for you along with running the backend and frontend. Just follow the instructions in README.md for more details.

Declaring your models

For the demo, we used a very simple table – with just two columns – an id and a tgeompoint column for the trip data. Using mobilitydb-sqlalchemy this is as simple as defining any regular table:

from flask_sqlalchemy import SQLAlchemy
from mobilitydb_sqlalchemy import TGeomPoint

db = SQLAlchemy()

class Trips(db.Model):
   __tablename__ = "trips"
   trip_id = db.Column(db.Integer, primary_key=True)
   trip = db.Column(TGeomPoint)

Note: The library also allows you to use the Trajectory class from MovingPandas as well. More about this is explained later in this tutorial.

Populating data

When adding data to the table, mobilitydb-sqlalchemy expects data in the tgeompoint column to be a time indexed pandas dataframe, with two columns – one for the spatial data  called “geometry” with Shapely Point objects and one for the temporal data “t” as regular python datetime objects.

from datetime import datetime
from shapely.geometry import Point

# Prepare and insert the data
# Typically it won’t be hardcoded like this, but it might be coming from 
# other data sources like a different database or maybe csv files
df = pd.DataFrame(
   [
       {"geometry": Point(0, 0), "t": datetime(2012, 1, 1, 8, 0, 0),},
       {"geometry": Point(2, 0), "t": datetime(2012, 1, 1, 8, 10, 0),},
       {"geometry": Point(2, -1.9), "t": datetime(2012, 1, 1, 8, 15, 0),},
   ]
).set_index("t")

trip = Trips(trip_id=1, trip=df)
db.session.add(trip)
db.session.commit()

Writing queries

In the demo, you see two modes. Both modes were designed specifically to explain how functions defined within MobilityDB can be leveraged by our webapp.

1. All trips mode – In this mode, we extract all trip data, along with distance travelled within each trip, and the average speed in that trip, both computed by MobilityDB itself using the ‘length’, ‘speed’ and ‘twAvg’ functions. This example also shows that MobilityDB functions can be chained to form more complicated queries.

mobilitydb-sqlalchemy-demo-1

trips = db.session.query(
   Trips.trip_id,
   Trips.trip,
   func.length(Trips.trip),
   func.twAvg(func.speed(Trips.trip))
).all()

2. Spatial query mode – In this mode, we extract only selective trip data, filtered by a user-selected region of interest. We then make a query to MobilityDB to extract only the trips which pass through the specified region. We use MobilityDB’s ‘intersects’ function to achieve this filtering at the database level itself.

mobilitydb-sqlalchemy-demo-2

trips = db.session.query(
   Trips.trip_id,
   Trips.trip,
   func.length(Trips.trip),
   func.twAvg(func.speed(Trips.trip))
).filter(
   func.intersects(Point(lat, lng).buffer(0.01).wkb, Trips.trip),
).all()

Using MovingPandas Trajectory objects

Mobilitydb-sqlalchemy also provides first-class support for MovingPandas Trajectory objects, which can be installed as an optional dependency of this library. Using this Trajectory class instead of plain DataFrames allows us to make use of much richer functionality over trajectory data like analysis speed, interpolation, splitting and simplification of trajectory points, calculating bounding boxes, etc. To make use of this feature, you have set the use_movingpandas flag to True while declaring your model, as shown in the below code snippet.

class TripsWithMovingPandas(db.Model):
   __tablename__ = "trips"
   trip_id = db.Column(db.Integer, primary_key=True)
   trip = db.Column(TGeomPoint(use_movingpandas=True))

Now when you query over this table, you automatically get the data parsed into Trajectory objects without having to do anything else. This also works during insertion of data – you can directly assign your movingpandas Trajectory objects to the trip column. In the below code snippet we show how inserting and querying works with movingpandas mode.

from datetime import datetime
from shapely.geometry import Point

# Prepare and insert the data
# Typically it won’t be hardcoded like this, but it might be coming from 
# other data sources like a different database or maybe csv files
df = pd.DataFrame(
   [
       {"geometry": Point(0, 0), "t": datetime(2012, 1, 1, 8, 0, 0),},
       {"geometry": Point(2, 0), "t": datetime(2012, 1, 1, 8, 10, 0),},
       {"geometry": Point(2, -1.9), "t": datetime(2012, 1, 1, 8, 15, 0),},
   ]
).set_index("t")

geo_df = GeoDataFrame(df)
traj = mpd.Trajectory(geo_df, 1)

trip = Trips(trip_id=1, trip=traj)
db.session.add(trip)
db.session.commit()

# Querying over this table would automatically map the resulting tgeompoint 
# column to movingpandas’ Trajectory class
result = db.session.query(TripsWithMovingPandas).filter(
   TripsWithMovingPandas.trip_id == 1
).first()

print(result.trip.__class__)
# <class 'movingpandas.trajectory.Trajectory'>

Bonus: trajectory data serialization

Along with mobilitydb-sqlalchemy, recently I have also released trajectory data serialization/compression libraries based on Google’s Encoded Polyline Format Algorithm, for python and javascript called trajectory and trajectory.js respectively. These libraries let you send trajectory data in a compressed format, resulting in smaller payloads if sending your data through human-readable serialization formats like JSON. In some of the internal APIs we use at Adonmo, we have seen this reduce our response sizes by more than half (>50%) sometimes upto 90%.

Want to learn more about mobilitydb-sqlalchemy? Check out the quick start & documentation.


This post is part of a series. Read more about movement data in GIS.

Folium vs. hvplot for interactive maps of Point GeoDataFrames

In the previous post, I showed how Folium can be used to create interactive maps of GeoPandas GeoDataFrames. Today’s post continues this theme. Specifically, it compares Folium to another dataviz library called hvplot. hvplot also recently added support for GeoDataFrames, so it’s interesting to see how these different solutions compare.

Minimum viable

The following snippets show the minimum code I found to put a GeoDataFrame of Points onto a map with either Folium or hvplot.

Folium does not automatically zoom to the data extent and I didn’t find a way to add the whole GeoDataFrame of Points without looping through the rows individually:

Hvplot on the other hand registers the hvplot function directly with the GeoDataFrame. This makes it as convenient to use as the original GeoPandas plot function. It also zooms to the data extent:

Standard interaction and zoom to area of interest

The following snippets ensure that the map is set to a useful extent and the map tools enable panning and zooming.

With Folium, we have to set the map center and the zoom. The map tools are Leaflet defaults, so panning and zooming work as expected:

Since hvplot does not come with mouse wheel zoom enabled by default, we need to set that:

Color by attribute

Finally, for many maps, we want to show the point location as well as an attribute value.

To create a continuous color ramp for a numeric value, we can use branca.colormap to define the marker fill color:

In hvplot, it is sufficient to specify the attribute of interest:

I’m pretty impressed with hvplot. The integration with GeoPandas is very smooth. Just don’t forget to set the geo=True parameter if you want to plot lat/lon geometries.

Folium seems less straightforward for this use case. Maybe I missed some option similar to the Choropleth function that I showed in the previous post.

Interactive plots for GeoPandas GeoDataFrames of LineStrings

GeoPandas makes it easy to create basic visualizations of GeoDataFrames:

However, if we want interactive plots, we need additional libraries. Folium (which is built on Leaflet) is a great option. However, all examples for plotting GeoDataFrames that I found focused on point or polygon data. So here is what I found to work for GeoDataFrames of LineStrings:

First, some imports:

import pandas as pd
import geopandas
import folium

Loading the data:

graph = geopandas.read_file('data/population_test-routes-geom.csv')
graph.crs = {'init' :'epsg:4326'}

Creating the map using folium.Choropleth:

m = folium.Map([48.2, 16.4], zoom_start=10)

folium.Choropleth(
    graph[graph.geometry.length>0.001],
    line_weight=3,
    line_color='blue'
).add_to(m)

m

I also tried using folium.PolyLine which seemed like the more obvious choice but does not seem to accept GeoDataFrames as input. Instead, it expects a list of coordinate pairs and of course it expects them to be in the opposite order that Shapely.LineString.coords provides … Oh the joys of geodata!

In any case, I had to limit the number of features that get plotted because Folium refuses to plot all 8778 features at once. I decided to filter by line length because drawing really short lines is pointless for my overview visualization anyway.

Stand-alone PyQGIS scripts with OSGeo4W

PyQGIS scripts are great to automate spatial processing workflows. It’s easy to run these scripts inside QGIS but it can be even more convenient to run PyQGIS scripts without even having to launch QGIS. To create a so-called “stand-alone” PyQGIS script, there are a few things that need to be taken care of. The following steps show how to set up PyCharm for stand-alone PyQGIS development on Windows10 with OSGeo4W.

An essential first step is to ensure that all environment variables are set correctly. The most reliable approach is to go to C:\OSGeo4W64\bin (or wherever OSGeo4W is installed on your machine), make a copy of qgis-dev-g7.bat (or any other QGIS version that you have installed) and rename it to pycharm.bat:

Instead of launching QGIS, we want that pycharm.bat launches PyCharm. Therefore, we edit the final line in the .bat file to start pycharm64.exe:

In PyCharm itself, the main task to finish our setup is configuring the project interpreter:

First, we add a new “system interpreter” for Python 3.7 using the corresponding OSGeo4W Python installation.

To finish the interpreter config, we need to add two additional paths pointing to QGIS\python and QGIS\python\plugins:

That’s it! Now we can start developing our stand-alone PyQGIS script.

The following example shows the necessary steps, particularly:

  1. Initializing QGIS
  2. Initializing Processing
  3. Running a Processing algorithm
import sys

from qgis.core import QgsApplication, QgsProcessingFeedback
from qgis.analysis import QgsNativeAlgorithms

QgsApplication.setPrefixPath(r'C:\OSGeo4W64\apps\qgis-dev', True)
qgs = QgsApplication([], False)
qgs.initQgis()

# Add the path to processing so we can import it next
sys.path.append(r'C:\OSGeo4W64\apps\qgis-dev\python\plugins')
# Imports usually should be at the top of a script but this unconventional 
# order is necessary here because QGIS has to be initialized first
import processing
from processing.core.Processing import Processing

Processing.initialize()
QgsApplication.processingRegistry().addProvider(QgsNativeAlgorithms())
feedback = QgsProcessingFeedback()

rivers = r'D:\Documents\Geodata\NaturalEarthData\Natural_Earth_quick_start\10m_physical\ne_10m_rivers_lake_centerlines.shp'
output = r'D:\Documents\Geodata\temp\danube3.shp'
expression = "name LIKE '%Danube%'"

danube = processing.run(
    'native:extractbyexpression',
    {'INPUT': rivers, 'EXPRESSION': expression, 'OUTPUT': output},
    feedback=feedback
    )['OUTPUT']

print(danube)

Dealing with delayed measurements in (Geo)Pandas

Yesterday, I learned about a cool use case in data-driven agriculture that requires dealing with delayed measurements. As Bert mentions, for example, potatoes end up in the machines and are counted a few seconds after they’re actually taken out of the ground:

Therefore, in order to accurately map yield, we need to take this temporal offset into account.

We need to make sure that time and location stay untouched, but need to shift the potato count value. To support this use case, I’ve implemented apply_offset_seconds() for trajectories in movingpandas:

    def apply_offset_seconds(self, column, offset):
        self.df[column] = self.df[column].shift(offset, freq='1s')

The following test illustrates its use: you can see how the value column is shifted by 120 second. Geometry and time remain unchanged but the value column is shifted accordingly. In this test, we look at the row with index 2 which we access using iloc[2]:

    def test_offset_seconds(self):
        df = pd.DataFrame([
            {'geometry': Point(0, 0), 't': datetime(2018, 1, 1, 12, 0, 0), 'value': 1},
            {'geometry': Point(-6, 10), 't': datetime(2018, 1, 1, 12, 1, 0), 'value': 2},
            {'geometry': Point(6, 6), 't': datetime(2018, 1, 1, 12, 2, 0), 'value': 3},
            {'geometry': Point(6, 12), 't': datetime(2018, 1, 1, 12, 3, 0), 'value':4},
            {'geometry': Point(6, 18), 't': datetime(2018, 1, 1, 12, 4, 0), 'value':5}
        ]).set_index('t')
        geo_df = GeoDataFrame(df, crs={'init': '31256'})
        traj = Trajectory(1, geo_df)
        traj.apply_offset_seconds('value', -120)
        self.assertEqual(traj.df.iloc[2].value, 5)
        self.assertEqual(traj.df.iloc[2].geometry, Point(6, 6))

Back to Top

Sustaining Members