Related Plugins and Tags

QGIS Planet

LLM-based spatial analysis assistants for QGIS

After the initial ChatGPT hype in 2023 (when we saw the first LLM-backed QGIS plugins, e.g. QChatGPT and QGPT Agent), there has been a notable slump in new development. As far as I can tell, none of the early plugins are actively maintained anymore. They were nice tech demos but with limited utility.

However, in the last month, I saw two new approaches for combining LLMs with QGIS that I want to share in this post:

IntelliGeo plugin: generating PyQGIS scripts or graphical models

At the QGIS User Conference in Bratislava, I had the pleasure to attend the “Large Language Models and GIS” workshop presented by Gustavo Garcia and Zehao Lu from the the University of Twente. There, they presented the IntelliGeo Plugin which enables the automatic generation of PyQGIS scripts and graphical models.

The workshop was packed. After we installed all dependencies and the plugin, it was exciting to test the graphical model generation capabilities. During the workshop, we used OpenAI’s API but the readme also mentions support for Cohere.

I was surprised to learn that even simple graphical models are actually pretty large files. This makes it very challenging to generate and/or modify models because they take up a big part of the LLM’s context window. Therefore, I expect that the PyQGIS script generation will be easier to achieve. But, of course, model generation would be even more impressive and useful since models are easier to edit for most users than code.

Image source: https://github.com/MahdiFarnaghi/intelli_geo

ChatGeoAI: chat with PyQGIS

ChatGeoAI is an approach presented in Mansourian, A.; Oucheikh, R. (2024). ChatGeoAI: Enabling Geospatial Analysis for Public through Natural Language, with Large Language Models. ISPRS Int. J. Geo-Inf.13, 348.

It uses a fine-tuned Llama 2 model in combination with spaCy for entity recognition and WorldKG ontology to write PyQGIS code that can perform a variety of different geospatial analysis tasks on OpenStreetMap data.

The paper is very interesting, describing the LLM fine-tuning, integration with QGIS, and evaluation of the generated code using different metrics. However, as far as I can tell, the tool is not publicly available and, therefore, cannot be tested.

Image source: https://www.mdpi.com/2220-9964/13/10/348

Are you aware of more examples that integrate QGIS with LLMs? Please share them in the comments below. I’d love to hear about them.

Adding basemaps to PyQGIS maps in Jupyter notebooks

In the previous post, we investigated how to bring QGIS maps into Jupyter notebooks.

Today, we’ll take the next step and add basemaps to our maps. This is trickier than I would have expected. In particular, I was fighting with “invalid” OSM tile layers until I realized that my QGIS application instance somehow lacked the “WMS” provider.

In addition, getting basemaps to work also means that we have to take care of layer and project CRSes and on-the-fly reprojections. So let’s get to work:

from IPython.display import Image
from PyQt5.QtGui import QColor
from PyQt5.QtWidgets import QApplication
from qgis.core import QgsApplication, QgsVectorLayer, QgsProject, QgsRasterLayer, \
    QgsCoordinateReferenceSystem, QgsProviderRegistry, QgsSimpleMarkerSymbolLayerBase
from qgis.gui import QgsMapCanvas
app = QApplication([])
qgs = QgsApplication([], False)
qgs.setPrefixPath(r"C:\temp", True)  # setting a prefix path should enable the WMS provider
qgs.initQgis()
canvas = QgsMapCanvas()
project = QgsProject.instance()
map_crs = QgsCoordinateReferenceSystem('EPSG:3857')
canvas.setDestinationCrs(map_crs)
print("providers: ", QgsProviderRegistry.instance().providerList())

To add an OSM basemap, we use the xyz tiles option of the WMS provider:

urlWithParams = 'type=xyz&url=https://tile.openstreetmap.org/{z}/{x}/{y}.png&zmax=19&zmin=0&crs=EPSG3857'
rlayer = QgsRasterLayer(urlWithParams, 'OpenStreetMap', 'wms')  
print(rlayer.crs())
if rlayer.isValid():
    project.addMapLayer(rlayer)
else:
    print('invalid layer')
    print(rlayer.error().summary()) 

If there are issues with the WMS provider, rlayer.error().summary() should point them out.

With both the vector layer and the basemap ready, we can finally plot the map:

canvas.setExtent(rlayer.extent())
plot_layers([vlayer,rlayer])

Of course, we can get more creative and style our vector layers:

vlayer.renderer().symbol().setColor(QColor("yellow"))
vlayer.renderer().symbol().symbolLayer(0).setShape(QgsSimpleMarkerSymbolLayerBase.Star)
vlayer.renderer().symbol().symbolLayer(0).setSize(10)
plot_layers([vlayer,rlayer])

And to switch to other basemaps, we just need to update the URL accordingly, for example, to load Carto tiles instead:

urlWithParams = 'type=xyz&url=http://basemaps.cartocdn.com/dark_all/{z}/{x}/{y}.png&zmax=19&zmin=0&crs=EPSG3857'
rlayer2 = QgsRasterLayer(urlWithParams, 'Carto', 'wms')  
print(rlayer2.crs())
if rlayer2.isValid():
    project.addMapLayer(rlayer2)
else:
    print('invalid layer')
    print(rlayer2.error().summary()) 
    
plot_layers([vlayer,rlayer2])

You can find the whole notebook at: https://github.com/anitagraser/QGIS-resources/blob/master/qgis3/notebooks/basemaps.ipynb

Plotting large point CSV files quickly & interactively

Even with all their downsides, CSV files are still a common data exchange format – particularly between disciplines with different tech stacks. Indeed, “How to Specify Data Types of CSV Columns for Use in QGIS” (originally written in 2011) is still one of the most popular posts on this blog. QGIS continues to be quite handy for visualizing CSV file contents. However, there are times when it’s just not enough, particularly when the number of rows in the CSV is in the range of multiple million. The following example uses a 12 million point CSV:

To give you an idea of the waiting times in QGIS, I’ve run the following script which loads and renders the CSV:

from datetime import datetime

def get_time():
    t2 = datetime.now()
    print(t2)
    print(t2-t1)
    print('Done :)')

canvas = iface.mapCanvas()
canvas.mapCanvasRefreshed.connect(get_time)

print('Starting ...')

t0 = datetime.now()
print(t0)

print('Loading CSV ...')

uri = "file:///E:/Geodata/AISDK/raw_ais/aisdk_20170701.csv?type=csv&xField=Longitude&yField=Latitude&crs=EPSG:4326&"
vlayer = QgsVectorLayer(uri, "layer name you like", "delimitedtext")

t1 = datetime.now()
print(t1)
print(t1 - t0)

print('Rendering ...')

QgsProject.instance().addMapLayer(vlayer)

The script output shows that creating the vector layer takes 02:39 minutes and rendering it takes over 05:10 minutes:

Starting ...
2020-12-06 12:35:56.266002
Loading CSV ...
2020-12-06 12:38:35.565332
0:02:39.299330
Rendering ...
2020-12-06 12:43:45.637504
0:05:10.072172
Done :)

Rendered CSV file in QGIS

Panning and zooming around are no fun either since rendering takes so long. Changing from a single symbol renderer to, for example, a heatmap renderer does not improve the rendering times. So we need a different solutions when we want to efficiently explore large point CSV files.

The Pandas data analysis library is well-know for being a convenient tool for handling CSVs. However, it’s less clear how to use it as a replacement for desktop GIS for exploring large CSVs with point coordinates. My favorite solution so far uses hvPlot + HoloViews + Datashader to provide interactive Bokeh plots in Jupyter notebooks.

hvPlot provides a high-level plotting API built on HoloViews that provides a general and consistent API for plotting data in (Geo)Pandas, xarray, NetworkX, dask, and others. (Image source: https://hvplot.holoviz.org)

But first things first! Loading the CSV as a Pandas Dataframe takes 10.7 seconds. Pandas’ default plotting function (based on Matplotlib), however, takes around 13 seconds and only produces a static scatter plot.

Loading and plotting the CSV with Pandas

hvPlot to the rescue!

We only need two more steps to get faster and interactive map plots (plus background maps!): First, we need to reproject the lat/lon values. (There’s a warning here, most likely since some of the input lat/lon values are invalid.) Then, we replace plot() with hvplot() and voilà:

Plotting the CSV with Datashader

As you can see from the above GIF, the whole process barely takes 2 seconds and the resulting map plot is interactive and very responsive.

12 million points are far from the limit. As long as the Pandas DataFrame fits into memory, we are good and when the datasets get bigger than that, there are Dask DataFrames. But that’s a story for another day.

Movement data in GIS #30: synchronized trajectory animations with QGIS temporal controller

QGIS Temporal Controller is a powerful successor of TimeManager. Temporal Controller is a new core feature of the current development version and will be shipped with the 3.14 release. This post demonstrates two key advantages of this new temporal support:

  1. Expression support for defining start and end timestamps
  2. Integration into the PyQGIS API

These features come in very handy in many use cases. For example, they make it much easier to create animations from folders full of GPS tracks since the files can now be loaded and configured automatically:

Script & Temporal Controller in action (click for full resolution)

All tracks start at the same location but at different times. (Kudos for Andrew Fletcher for recordings these tracks and sharing them with me!) To create an animation that shows all tracks start simultaneously, we need to synchronize them. This synchronization can be achieved on-the-fly by subtracting the start time from all track timestamps using an expression:

directory = "E:/Google Drive/QGIS_Course/05_TimeManager/Example_Dayrides/"

def load_and_configure(filename):
    path = os.path.join(directory, filename)
    uri = 'file:///' + path + "?type=csv&escape=&useHeader=No&detectTypes=yes"
    uri = uri + "&crs=EPSG:4326&xField=field_3&yField=field_2"
    vlayer = QgsVectorLayer(uri, filename, "delimitedtext")
    QgsProject.instance().addMapLayer(vlayer)

    mode = QgsVectorLayerTemporalProperties.ModeFeatureDateTimeStartAndEndFromExpressions
    expression = """to_datetime(field_1) -
    make_interval(seconds:=minimum(epoch(to_datetime("field_1")))/1000)
    """

    tprops = vlayer.temporalProperties()
    tprops.setStartExpression(expression)
    tprops.setEndExpression(expression) # optional
    tprops.setMode(mode)
    tprops.setIsActive(True)

for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        load_and_configure(filename)

The above script loads all CSV files from the given directory (field_1 is the timestamp, field_2 is y, and field_3 is x), enables sets the start and end expression as well as the corresponding temporal control mode and finally activates temporal rendering. The resulting config can be verified in the layer properties dialog:

To adapt this script to other datasets, it’s sufficient to change the file directory and revisit the layer uri definition as well as the field names referenced in the expression.


This post is part of a series. Read more about movement data in GIS.

PyQGIS Cookbook revision 2020

We are happy to announce that the PyQGIS Cookbook has received a complete overhaul and is now better than ever.

The PyQGIS Cookbook is a great source of information, not only for PyQGIS beginners or plugin developers, but also for C++ developers: it contains a lot of information about the internals of the QGIS API that you cannot really find anywhere else.

The main point addressed in this review were:

  1. All the code snippets have been reviewed and put under automated test in CI. Before this revision, there were 62 tests. Now there are over 300 tested snippets.
  2. A few snippets had to be updated because of changes in the QGIS API or because of the deprecation of methods (CRS handling in particular due to the proj6 switch).
  3. Textual descriptions have been edited to update the contents where the API has substantially changed.
  4. The material covering Python for QGIS Server has been reorganized and now includes new snippets and introductory texts about the new modules and OGC APIs architecture.

For the full summary, please refer to Alessandro Pasotti’s report on the PSC mailing list.

Thank you to Alessandro Pasotti for taking on this project and thank you to all our sponsor and donors who make this initiative possible!

Configure editing form widgets using PyQGIS

PT | EN

As I was preparing a QGIS Project to read a database structured according to the new rules and technical specifications for the Portuguese Cartography, I started to configure the editing forms for several layers, so that:

  1. Make some fields read-only, like for example an identifier field.
  2. Configure widgets better suited for each field, to help the user and avoid errors. For example, date-time files with a pop-up calendar, and value lists with dropdown selectors.

Basically, I wanted something like this:

Peek 2019-09-30 15-04_2

Let me say that, in PostGIS layers, QGIS does a great job in figuring out the best widget to use for each field, as well as the constraints to apply. Which is a great help. Nevertheless, some need some extra configuration.

If I had only a few layers and fields, I would have done them all by hand, but after the 5th layer my personal mantra started to chime in:

“If you are using a computer to perform a repetitive manual task, you are doing it wrong!”

So, I began to think how could I configure the layers and fields more systematically. After some research and trial and error, I came up with the following PyQGIS functions.

Make a field Read-only

The identifier field (“identificador”) is automatically generated by the database. Therefore, the user shouldn’t edit it. So I had better make it read only

Layer Properties - cabo_electrico | Attributes Form_103

To make all the identifier fields read-only, I used the following code.

def field_readonly(layer, fieldname, option = True):
    fields = layer.fields()
    field_idx = fields.indexOf(fieldname)
    if field_idx >= 0:
        form_config = layer.editFormConfig()
        form_config.setReadOnly(field_idx, option)
        layer.setEditFormConfig(form_config)

# Example for the field "identificador"

project = QgsProject.instance()
layers = project.mapLayers() 

for layer in layers.values():
    field_readonly(layer,'identificador')

Set fields with DateTime widget

The date fields are configured automatically, but the default widget setting only outputs the date, and not date-time, as the rules required.

I started by setting a field in a layer exactly how I wanted, then I tried to figure out how those setting were saved in PyQGIS using the Python console:

>>>layer = iface.mapCanvas().currentLayer()
>>>layer.fields().indexOf('inicio_objeto')
1
>>>field = layer.fields()[1]
>>>field.editorWidgetSetup().type()
'DateTime'
>>>field.editorWidgetSetup().config()
{'allow_null': True, 'calendar_popup': True, 'display_format': 'yyyy-MM-dd HH:mm:ss', 'field_format': 'yyyy-MM-dd HH:mm:ss', 'field_iso_format': False}

Knowing this, I was able to create a function that allows configuring a field in a layer using the exact same settings, and apply it to all layers.

def field_to_datetime(layer, fieldname):
    config = {'allow_null': True,
              'calendar_popup': True,
              'display_format': 'yyyy-MM-dd HH:mm:ss',
              'field_format': 'yyyy-MM-dd HH:mm:ss',
              'field_iso_format': False}
    type = 'Datetime'
    fields = layer.fields()
    field_idx = fields.indexOf(fieldname)
    if field_idx >= 0:
        widget_setup = QgsEditorWidgetSetup(type,config)
        layer.setEditorWidgetSetup(field_idx, widget_setup)

# Example applied to "inicio_objeto" e "fim_objeto"

for layer in layers.values():
    field_to_datetime(layer,'inicio_objeto')
    field_to_datetime(layer,'fim_objeto')

Setting a field with the Value Relation widget

In the data model, many tables have fields that only allow a limited number of values. Those values are referenced to other tables, the Foreign keys.

In these cases, it’s quite helpful to use a Value Relation widget. To configure fields with it in a programmatic way, it’s quite similar to the earlier example, where we first neet to set an example and see how it’s stored, but in this case, each field has a slightly different settings

Luckily, whoever designed the data model, did a favor to us all by giving the same name to the fields and the related tables, making it possible to automatically adapt the settings for each case.

The function stars by gathering all fields in which the name starts with ‘valor_’ (value). Then, iterating over those fields, adapts the configuration to use the reference layer that as the same name as the field.

def field_to_value_relation(layer):
    fields = layer.fields()
    pattern = re.compile(r'^valor_')
    fields_valor = [field for field in fields if pattern.match(field.name())]
    if len(fields_valor) > 0:
        config = {'AllowMulti': False,
                  'AllowNull': True,
                  'FilterExpression': '',
                  'Key': 'identificador',
                  'Layer': '',
                  'NofColumns': 1,
                  'OrderByValue': False,
                  'UseCompleter': False,
                   'Value': 'descricao'}
        for field in fields_valor:
            field_idx = fields.indexOf(field.name())
            if field_idx >= 0:
                print(field)
                try:
                    target_layer = QgsProject.instance().mapLayersByName(field.name())[0]
                    config['Layer'] = target_layer.id()
                    widget_setup = QgsEditorWidgetSetup('ValueRelation',config)
                    layer.setEditorWidgetSetup(field_idx, widget_setup)
                except:
                    pass
            else:
                return False
    else:
        return False
    return True
    
# Correr função em todas as camadas
for layer in layers.values():
    field_to_value_relation(layer)

Conclusion

In a relatively quick way, I was able to set all the project’s layers with the widgets I needed.Peek 2019-09-30 16-06

This seems to me like the tip of the iceberg. If one has the need, with some search and patience, other configurations can be changed using PyQGIS. Therefore, think twice before embarking in configuring a big project, layer by layer, field by fields.

Stand-alone PyQGIS scripts with OSGeo4W

PyQGIS scripts are great to automate spatial processing workflows. It’s easy to run these scripts inside QGIS but it can be even more convenient to run PyQGIS scripts without even having to launch QGIS. To create a so-called “stand-alone” PyQGIS script, there are a few things that need to be taken care of. The following steps show how to set up PyCharm for stand-alone PyQGIS development on Windows10 with OSGeo4W.

An essential first step is to ensure that all environment variables are set correctly. The most reliable approach is to go to C:\OSGeo4W64\bin (or wherever OSGeo4W is installed on your machine), make a copy of qgis-dev-g7.bat (or any other QGIS version that you have installed) and rename it to pycharm.bat:

Instead of launching QGIS, we want that pycharm.bat launches PyCharm. Therefore, we edit the final line in the .bat file to start pycharm64.exe:

In PyCharm itself, the main task to finish our setup is configuring the project interpreter:

First, we add a new “system interpreter” for Python 3.7 using the corresponding OSGeo4W Python installation.

To finish the interpreter config, we need to add two additional paths pointing to QGIS\python and QGIS\python\plugins:

That’s it! Now we can start developing our stand-alone PyQGIS script.

The following example shows the necessary steps, particularly:

  1. Initializing QGIS
  2. Initializing Processing
  3. Running a Processing algorithm
import sys

from qgis.core import QgsApplication, QgsProcessingFeedback
from qgis.analysis import QgsNativeAlgorithms

QgsApplication.setPrefixPath(r'C:\OSGeo4W64\apps\qgis-dev', True)
qgs = QgsApplication([], False)
qgs.initQgis()

# Add the path to processing so we can import it next
sys.path.append(r'C:\OSGeo4W64\apps\qgis-dev\python\plugins')
# Imports usually should be at the top of a script but this unconventional 
# order is necessary here because QGIS has to be initialized first
import processing
from processing.core.Processing import Processing

Processing.initialize()
QgsApplication.processingRegistry().addProvider(QgsNativeAlgorithms())
feedback = QgsProcessingFeedback()

rivers = r'D:\Documents\Geodata\NaturalEarthData\Natural_Earth_quick_start\10m_physical\ne_10m_rivers_lake_centerlines.shp'
output = r'D:\Documents\Geodata\temp\danube3.shp'
expression = "name LIKE '%Danube%'"

danube = processing.run(
    'native:extractbyexpression',
    {'INPUT': rivers, 'EXPRESSION': expression, 'OUTPUT': output},
    feedback=feedback
    )['OUTPUT']

print(danube)

On custom layout checks in QGIS 3.6, and how they can do your work for you!

Recently, we had the opportunity to implement an exciting new feature within QGIS. An enterprise with a large number of QGIS installs was looking for a way to control the outputs which staff were creating from the software, and enforce a set of predefined policies. The policies were designed to ensure that maps created in QGIS’ print layout designer would meet a set of minimum standards, e.g.:

  • Layouts must include a “Copyright 2019 by XXX” label somewhere on the page
  • All maps must have a linked scale bar
  • No layers from certain blacklisted sources (e.g. Google Maps tiles) are permitted
  • Required attribution text for other layers must be included somewhere on the layout

Instead of just making a set of written policies and hoping that staff correctly follow them, it was instead decided that the checks should be performed automatically by QGIS itself. If any of the checks failed (indicating that the map wasn’t complying to the policies), the layout export would be blocked and the user would be advised what they needed to change in their map to make it compliant.

The result of this work is a brand new API for implementing custom “validity checks” within QGIS. Out of the box, QGIS 3.6 ships with two in-built validity checks. These are:

  • A check to warn users when a layout includes a scale bar which isn’t linked to a map
  • A check to warn users if a map overview in a layout isn’t linked to a map (e.g. if the linked map has been deleted)

All QGIS 3.6 users will see a friendly warning if either of these conditions are met, advising them of the potential issue.

 

The exciting stuff comes in custom, in-house checks. These are written in PyQGIS, so they can be deployed through in-house plugins or startup scripts. Let’s explore some examples to see how these work.

A basic check looks something like this:

from qgis.core import check

@check.register(type=QgsAbstractValidityCheck.TypeLayoutCheck)
def my_layout_check(context, feedback):
  results = ...
  return results

Checks are created using the @check.register decorator. This takes a single argument, the check type. For now, only layout checks are implemented, so this should be set to QgsAbstractValidityCheck.TypeLayoutCheck. The check function is given two arguments, a QgsValidityCheckContext argument, and a feedback argument. We can safely ignore the feedback option for now, but the context argument is important. This context contains information useful for the check to run — in the case of layout checks, the context contains a reference to the layout being checked. The check function should return a list of QgsValidityCheckResult objects, or an empty list if the check was passed successfully with no warnings or errors.

Here’s a more complete example. This one throws a warning whenever a layout map item is set to the web mercator (EPSG:3875) projection:

@check.register(type=QgsAbstractValidityCheck.TypeLayoutCheck)
def layout_map_crs_choice_check(context, feedback):
  layout = context.layout
  results = []
  for i in layout.items():
    if isinstance(i, QgsLayoutItemMap) and i.crs().authid() == 'EPSG:3857':
      res = QgsValidityCheckResult()
      res.type = QgsValidityCheckResult.Warning
      res.title='Map projection is misleading'
      res.detailedDescription='The projection for the map item {} is set to Web Mercator (EPSG:3857) which misrepresents areas and shapes. Consider using an appropriate local projection instead.'.format(i.displayName())
      results.append(res)

  return results

Here, our check loops through all the items in the layout being tested, looking for QgsLayoutItemMap instances. It then checks the CRS for each map, and if that CRS is ‘EPSG:3857’, a warning result is returned. The warning includes a friendly message for users advising them why the check failed.

In this example our check is returning results with a QgsValidityCheckResult.Warning type. Warning results are shown to users, but they don’t prevent users from proceeding and continuing to export their layout.

Checks can also return “critical” results. If any critical results are obtained, then the actual export itself is blocked. The user is still shown the messages generated by the check so that they know how to resolve the issue, but they can’t proceed with the export until they’ve fixed their layout. Here’s an example of a check which returns critical results, preventing layout export if there’s no “Copyright 2019 North Road” labels included on their layout:

@check.register(type=QgsAbstractValidityCheck.TypeLayoutCheck)
def layout_map_crs_choice_check(context, feedback):
  layout = context.layout
  for i in layout.items():
    if isinstance(i, QgsLayoutItemLabel) and 'Copyright 2019 North Road' in i.currentText():
      return

  # did not find copyright text, block layout export
  res = QgsValidityCheckResult()
  res.type = QgsValidityCheckResult.Critical
  res.title = 'Missing copyright label'
  res.detailedDescription = 'Layout has no "Copyright" label. Please add a label containing the text "Copyright 2019 North Road".'
  return [res]

If we try to export a layout with the copyright notice, we now get this error:

Notice how the OK button is disabled, and users are forced to fix the error before they can export their layouts.

Here’s a final example. This one runs through all the layers included within maps in the layout, and if any of them come from a “blacklisted” source, the user is not permitted to proceed with the export:

@check.register(type=QgsAbstractValidityCheck.TypeLayoutCheck)
def layout_map_crs_choice_check(context, feedback):
  layout = context.layout
  for i in layout.items():
    if isinstance(i, QgsLayoutItemMap):
      for l in i.layersToRender():
        # check if layer source is blacklisted
        if 'mt1.google.com' in l.source():
          res = QgsValidityCheckResult()
          res.type = QgsValidityCheckResult.Critical
          res.title = 'Blacklisted layer source'
          res.detailedDescription = 'This layout includes a Google maps layer ("{}"), which is in violation of their Terms of Service.'.format(l.name())
          return [res]

Of course, all checks are run each time — so if a layout fails multiple checks, the user will see a summary of ALL failed checks, and can click on each in turn to see the detailed description of the failure.

So there we go — when QGIS 3.6 is released in late February 2019, you’ll  have access to this API and can start making QGIS automatically enforce your organisation policies for you! The really neat thing is that this doesn’t only apply to large organisations. Even if you’re a one-person shop using QGIS, you could write your own checks to  make QGIS “remind” you when you’ve forgotten to include something in your products. It’d even be possible to hook into one of the available Python spell checking libraries to write a spelling check! With any luck, this should lead to better quality outputs and less back and forth with your clients.

North Road are leading experts in customising the QGIS application for enterprise installs. If you’d like to discuss how you can deploy in-house customisation like this within your organisation, contact us for further details!

PyQGIS101 part 10 published!

PyQGIS 101: Introduction to QGIS Python programming for non-programmers has now reached the part 10 milestone!

Beyond the obligatory Hello world! example, the contents so far include:

If you’ve been thinking about learning Python programming, but never got around to actually start doing it, give PyQGIS101 a try.

I’d like to thank everyone who has already provided feedback to the exercises. Every comment is important to help me understand the pain points of learning Python for QGIS.

I recently read an article – unfortunately I forgot to bookmark it and cannot locate it anymore – that described the problems with learning to program very well: in the beginning, it’s rather slow going, you don’t know the right terminology and therefore don’t know what to google for when you run into issues. But there comes this point, when you finally get it, when the terminology becomes clearer, when you start thinking “that might work” and it actually does! I hope that PyQGIS101 will be a help along the way.

Thoughts on “FOSS4G/SOTM Oceania 2018”, and the PyQGIS API improvements which it caused

Last week the first official “FOSS4G/SOTM Oceania” conference was held at Melbourne University. This was a fantastic event, and there’s simply no way I can extend sufficient thanks to all the organisers and volunteers who put this event together. They did a brilliant job, and their efforts are even more impressive considering it was the inaugural event!

Upfront — this is not a recap of the conference (I’m sure someone else is working on a much more detailed write up of the event!), just some musings I’ve had following my experiences assisting Nathan Woodrow deliver an introductory Python for QGIS workshop he put together for the conference. In short, we both found that delivering this workshop to a group of PyQGIS newcomers was a great way for us to identify “pain points” in the PyQGIS API and areas where we need to improve. The good news is that as a direct result of the experiences during this workshop the API has been improved and streamlined! Let’s explore how:

Part of Nathan’s workshop (notes are available here) focused on a hands-on example of creating a custom QGIS “Processing” script. I’ve found that preparing workshops is guaranteed to expose a bunch of rare and tricky software bugs, and this was no exception! Unfortunately the workshop was scheduled just before the QGIS 3.4.2 patch release which fixed these bugs, but at least they’re fixed now and we can move on…

The bulk of Nathan’s example algorithm is contained within the following block (where “distance” is the length of line segments we want to chop our features up into):

for input_feature in enumerate(features):
    geom = feature.geometry().constGet()
    if isinstance(geom, QgsLineString):
        continue
    first_part = geom.geometryN(0)
    start = 0
    end = distance
    length = first_part.length()

    while start < length:
        new_geom = first_part.curveSubstring(start,end)

        output_feature = input_feature
        output_feature.setGeometry(QgsGeometry(new_geom))
        sink.addFeature(output_feature)

        start += distance
        end += distance

There’s a lot here, but really the guts of this algorithm breaks down to one line:

new_geom = first_part.curveSubstring(start,end)

Basically, a new geometry is created for each trimmed section in the output layer by calling the “curveSubstring” method on the input geometry and passing it a start and end distance along the input line. This returns the portion of that input LineString (or CircularString, or CompoundCurve) between those distances. The PyQGIS API nicely hides the details here – you can safely call this one method and be confident that regardless of the input geometry type the result will be correct.

Unfortunately, while calling the “curveSubstring” method is elegant, all the code surrounding this call is not so elegant. As a (mostly) full-time QGIS developer myself, I tend to look over oddities in the API. It’s easy to justify ugly API as just “how it’s always been”, and over time it’s natural to develop a type of blind spot to these issues.

Let’s start with the first ugly part of this code:

geom = input_feature.geometry().constGet()
if isinstance(geom, QgsLineString):
    continue
first_part = geom.geometryN(0)
# chop first_part into sections of desired length
...

This is rather… confusing… logic to follow. Here the script is fetching the geometry of the input feature, checking if it’s a LineString, and if it IS, then it skips that feature and continues to the next. Wait… what? It’s skipping features with LineString geometries?

Well, yes. The algorithm was written specifically for one workshop, which was using a MultiLineString layer as the demo layer. The script takes a huge shortcut here and says “if the input feature isn’t a MultiLineString, ignore it — we only know how to deal with multi-part geometries”. Immediately following this logic there’s a call to geometryN( 0 ), which returns just the first part of the MultiLineString geometry.

There’s two issues here — one is that the script just plain won’t work for LineString inputs, and the second is that it ignores everything BUT the first part in the geometry. While it would be possible to fix the script and add a check for the input geometry type, put in logic to loop over all the parts of a multi-part input, etc, that’s instantly going to add a LOT of complexity or duplicate code here.

Fortunately, this was the perfect excuse to improve the PyQGIS API itself so that this kind of operation is simpler in future! Nathan and I had a debrief/brainstorm after the workshop, and as a result a new “parts iterator” has been implemented and merged to QGIS master. It’ll be available from version 3.6 on. Using the new iterator, we can simplify the script:

geom = input_feature.geometry()
for part in geom.parts():
    # chop part into sections of desired length
    ...

Win! This is simultaneously more readable, more Pythonic, and automatically works for both LineString and MultiLineString inputs (and in the case of MultiLineStrings, we now correctly handle all parts).

Here’s another pain-point. Looking at the block:

new_geom = part.curveSubstring(start,end)
output_feature = input_feature
output_feature.setGeometry(QgsGeometry(new_geom))

At first glance this looks reasonable – we use curveSubstring to get the portion of the curve, then make a copy of the input_feature as output_feature (this ensures that the features output by the algorithm maintain all the attributes from the input features), and finally set the geometry of the output_feature to be the newly calculated curve portion. The ugliness here comes in this line:

output_feature.setGeometry(QgsGeometry(new_geom))

What’s that extra QgsGeometry(…) call doing here? Without getting too sidetracked into the QGIS geometry API internals, QgsFeature.setGeometry requires a QgsGeometry argument, not the QgsAbstractGeometry subclass which is returned by curveSubstring.

This is a prime example of a “paper-cut” style issue in the PyQGIS API. Experienced developers know and understand the reasons behind this, but for newcomers to PyQGIS, it’s an obscure complexity. Fortunately the solution here was simple — and after the workshop Nathan and I added a new overload to QgsFeature.setGeometry which accepts a QgsAbstractGeometry argument. So in QGIS 3.6 this line can be simplified to:

output_feature.setGeometry(new_geom)

Or, if you wanted to make things more concise, you could put the curveSubstring call directly in here:

output_feature = input_feature
output_feature.setGeometry(part.curveSubstring(start,end))

Let’s have a look at the simplified script for QGIS 3.6:

for input_feature in enumerate(features):
    geom = feature.geometry()
    for part in geom.parts():
        start = 0
        end = distance
        length = part.length()

        while start < length:
            output_feature = input_feature
            output_feature.setGeometry(part.curveSubstring(start,end))
            sink.addFeature(output_feature)

            start += distance
            end += distance

This is MUCH nicer, and will be much easier to explain in the next workshop! The good news is that Nathan has more niceness on the way which will further improve the process of writing QGIS Processing script algorithms. You can see some early prototypes of this work here:

So there we go. The process of writing and delivering a workshop helps to look past “API blind spots” and identify the ugly points and traps for those new to the API. As a direct result of this FOSS4G/SOTM Oceania 2018 Workshop, the QGIS 3.6 PyQGIS API will be easier to use, more readable, and less buggy! That’s a win all round!

PyQGIS for non-programmers

If you’re are following me on Twitter, you’ve certainly already read that I’m working on PyQGIS 101 a tutorial to help GIS users to get started with Python programming for QGIS.

I’ve often been asked to recommend Python tutorials for beginners and I’ve been surprised how difficult it can be to find an engaging tutorial for Python 3 that does not assume that the reader already knows all kinds of programming concepts.

It’s been a while since I started programming, but I do teach QGIS and Python programming for QGIS to university students and therefore have some ideas of which concepts are challenging. Nonetheless, it’s well possible that I overlook something that is not self explanatory. If you’re using PyQGIS 101 and find that some points could use further explanations, please leave a comment on the corresponding page.

PyQGIS 101 is a work in progress. I’d appreciate any feedback, particularly from beginners!

New PyQGIS documentation

We are proud to announce our new dedicated documentation of the QGIS Python API (also called PyQGIS) which is now available at https://qgis.org/pyqgis:

While the QGIS API has long been documented, Python developers in the past had to work with the general C++ documentation that wasn’t always straightforward to use. The new PyQGIS documentation presents the API in an accessible pythonic manner.

Of course, creating a good API documentation from source code in an automated way, is not trivial. A key challenge was to automatically create Python bindings files (or SIP files). A custom Perl script known as “sipify” now enables us to automatically integrate the C++ documentation into the Python bindings and keep them up to date. Another challenge was to create the documentation itself using Sphinx. Two detailed reports containing all the technical details of the first and second generation of the documentation are available if you want to learn more about the underlying architecture.

This has been a really important infrastructure project for QGIS that has been made possible by support from our donors and sponsors, as well as the generous in-kind contributions of our community members.

Where's my .qgis3 Folder?

There's been several posts to GIS StackExchange along the lines of:

Where's my .qgis3 folder?

Prior to QGIS 3, the .qgis/.qgis2 folder was found under your home directory. At version 3, the folder has moved to a more standard profile location for your operating system.

There are a couple of ways to determine where the folder is located:

  • Use the Settings->User Profiles->Open active profile folder menu item
  • Use QgsApplication.qgisSettingsDirPath from Python or the console

Here are the "standard" locations for Linux, Mac, and Windows, as found under your HOME directory:

  • Linux:
    • .local/share/QGIS/QGIS3/profiles/default
  • Mac OS X:
    • Library/Application Support/QGIS/QGIS3/profiles/default
  • Windows:
    • AppData\Roaming\QGIS\QGIS3\profiles\default

To get the location of your plugins directory, just add python/plugins to the appropriate location above. For example:

AppData\Roaming\QGIS\QGIS3\profiles\default\python\plugins

From the Settings->User Profiles menu, you'll notice a New profile item. This allows you to have multiple configurations of QGIS 3. Each new profile is created in the same "base" location as listed above. For example:

AppData\Roaming\QGIS\QGIS3\profiles\new_profile

Quick Guide to Getting Started with PyQGIS 3 on Windows

Getting started with Python and QGIS 3 can be a bit overwhelming. In this post we give you a quick start to get you up and running and maybe make your PyQGIS life a little easier.

There are likely many ways to setup a working PyQGIS development environment---this one works pretty well.

Contents

Requirements

  • OSGeo4W Advanced Install of QGIS
  • pip (for installing/managing Python packages)
  • pb_tool (cross-platform tool for compiling/deploying/distributing QGIS plugin)
  • A customized startup script to set the environment (pyqgis.cmd)
  • IDE (optional)
  • Emacs (just kidding)
  • Vim (just kidding)

We'll start with the installs.

Installing

Almost everything we need can be installed using the OSGeo4W installer available on the QGIS website.

OSGeo4W

From the QGIS website, download the appropriate network installer (32 or 64 bit) for QGIS 3.

  • Run the installer and choose the Advanced Install option
  • Install from Internet
  • Choose a directory for the install---I prefer a path without spaces such as C:\OSGeo4W
  • Accept default for local package directory and Start menu name
  • Tweak network connection option if needed on the Select Your Internet Connection screen
  • Accept default download site location
  • From the Select packages screen, select: Desktop -> qgis: QGIS Desktop

When you click Next a bunch of additional packages will be suggested---just accept them and continue the install.

Once complete you will have a functioning QGIS install along with the other parts we need. If you want to work with the nightly build of QGIS, choose Desktop -> qgis-dev instead.

If you installed QGIS using the standalone installer, the easiest option is to remove it and install from OSGeo4W. You can run both the standalone and OSGeo4W versions on the same machine, but you need to be extra careful not to mix up the environment.

Setting the Environment

To continue with the setup, we need to set the environment by creating a .cmd script. The following is adapted from several sources, and trimmed down to the minimum. Copy and paste it into a file named pyqgis.cmd and save it to a convenient location (like your HOME directory).

@echo off
SET OSGEO4W_ROOT=C:\OSGeo4W3
call "%OSGEO4W_ROOT%"\bin\o4w_env.bat
call "%OSGEO4W_ROOT%"\apps\grass\grass-7.4.0\etc\env.bat
@echo off
path %PATH%;%OSGEO4W_ROOT%\apps\qgis-dev\bin
path %PATH%;%OSGEO4W_ROOT%\apps\grass\grass-7.4.0\lib
path %PATH%;C:\OSGeo4W3\apps\Qt5\bin
path %PATH%;C:\OSGeo4W3\apps\Python36\Scripts

set PYTHONPATH=%PYTHONPATH%;%OSGEO4W_ROOT%\apps\qgis-dev\python
set PYTHONHOME=%OSGEO4W_ROOT%\apps\Python36

set PATH=C:\Program Files\Git\bin;%PATH%

cmd.exe

You should customize the set PATH statement to add any paths you want available when working from the command line. I added paths to my git install.

The last line starts a cmd shell with the settings specified above it. We'll see an example of starting an IDE in a bit.

You can test to make sure all is well by double-clicking on our pyqgis.cmd script, then starting Python and attempting to import one of the QGIS modules:

C:\Users\gsherman>python3
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (In tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import qgis.core
>>> import PyQt5.QtCore

If you don't get any complaints on import, things are looking good.

Installing pb_tool

Open your customized shell (double-click on pyqgis.cmd to start it) to install pb_tool:

python3 -m pip install pb_tool

Check to see if pb_tool is installed correctly:

C:\Users\gsherman>pb_tool
Usage: pb_tool [OPTIONS] COMMAND [ARGS]...

  Simple Python tool to compile and deploy a QGIS plugin. For help on a
  command use --help after the command: pb_tool deploy --help.

  pb_tool requires a configuration file (default: pb_tool.cfg) that declares
  the files and resources used in your plugin. Plugin Builder 2.6.0 creates
  a config file when you generate a new plugin template.

  See http://g-sherman.github.io/plugin_build_tool for for an example config
  file. You can also use the create command to generate a best-guess config
  file for an existing project, then tweak as needed.

  Bugs and enhancement requests, see:
  https://github.com/g-sherman/plugin_build_tool

Options:
  --help  Show this message and exit.

Commands:
  clean       Remove compiled resource and ui files
  clean_docs  Remove the built HTML help files from the...
  compile     Compile the resource and ui files
  config      Create a config file based on source files in...
  create      Create a new plugin in the current directory...
  dclean      Remove the deployed plugin from the...
  deploy      Deploy the plugin to QGIS plugin directory...
  doc         Build HTML version of the help files using...
  help        Open the pb_tools web page in your default...
  list        List the contents of the configuration file
  translate   Build translations using lrelease.
  update      Check for update to pb_tool
  validate    Check the pb_tool.cfg file for mandatory...
  version     Return the version of pb_tool and exit
  zip         Package the plugin into a zip file suitable...

If you get an error, make sure C:\OSGeo4W3\apps\Python36\Scripts is in your PATH.

More information on using pb_tool is available on the project website.

Working on the Command Line

Just double-click on your pyqgis.cmd script from the Explorer or a desktop shortcut to start a cmd shell. From here you can use Python interactively and also use pb_tool to compile and deploy your plugin for testing.

IDE Example

By adding one line to our pyqgis.cmd script, we can start our IDE with the proper settings to recognize the QGIS libraries:

start "PyCharm aware of Quantum GIS" /B "C:\Program Files (x86)\JetBrains\PyCharm 3.4.1\bin\pycharm.exe" %*

We added the start statement with the path to the IDE (in this case PyCharm). If you save this to something like pycharm.cmd, you can double-click on it to start PyCharm. The same method works for other IDEs, such as PyDev.

Within your IDE settings, point it to use the Python interpreter included with OSGeo4W---typically at: %OSGEO4W_ROOT%\bin\python3.exe. This will make it pick up all the QGIS goodies needed for development, completion, and debugging. In my case OSGEO4W_ROOT is C:\OSGeo4W3, so in the IDE, the path to the correct Python interpreter would be: C:\OSGeo4W3\bin\python3.exe.

Make sure you adjust the paths in your .cmd scripts to match your system and software locations.

Workflow

Here is an example of a workflow you can use once you're setup for development.

Creating a New Plugin

  1. Use the Plugin Builder plugin to create a starting point [1]
  2. Start your pyqgis.cmd shell
  3. Use pb_tool to compile and deploy the plugin (pb_tool deploy will do it all in one pass)
  4. Activate it in QGIS and test it out
  5. Add code, deploy, test, repeat

Working with Existing Plugin Code

The steps are basically the same was creating a new plugin, except we start by using pb_tool to create a new config file:

  1. Start your pyqgis.cmd shell
  2. Change to the directory containing your plugin code
  3. Use pb_tool create to create a config file
  4. Edit pb_tool.cfg to adjust/add things create may have missed
  5. Start at step 3 in Creating a New Plugin and press on

Troubleshooting

Assuming you have things properly installed, trouble usually stems from an incorrect environment.

  • Make sure QGIS runs and the Python console is available and working
  • Check all the paths in your pygis.cmd or your custom IDE cmd script
  • Make sure your IDE is using the Python interpreter that comes with OSGeo4W


[1] Plugin Builder 3.x generates a pb_tool config file

Porting Processing scripts to QGIS3

I’ll start with some tech talk first. Feel free to jump to the usage example further down if you are here for the edge bundling plugin.

As you certainly know, QGIS 3 brings a lot of improvements and under-the-hood changes. One of those changes affects all Python scripts. They need to be updated to Python 3 and the new PyQGIS API. (See the official migration guide for details.)

To get ready for the big 3.0 release, I’ve started porting my Processing tools. The edge bundling script is my first candidate for porting to QGIS 3. I also wanted to use this opportunity to “upgrade” from a simple script to a plugin that integrates into Processing.

I used Alexander Bruy’s “prepair for Processing” plugin as a template but you can also find an example template in your Processing folder. (On my system, it is located in C:\OSGeo4W64\apps\qgis-dev\python\plugins\processing\algs\exampleprovider.)

Since I didn’t want to miss the advantages of a good IDE, I set up PyCharm as described by Heikki Vesanto. This will give you code completion for Python 3 and PyQGIS which is very helpful for refactoring and porting. (I also tried Eclipse with PyDev but if you don’t have a favorite IDE yet, I find PyCharm easier to install and configure.)

My PyCharm startup script qgis3_pycharm.bat is a copy of C:\OSGeo4W64\bin\python-qgis-dev.bat with the last line altered to start PyCharm:

@echo off
call "%~dp0\o4w_env.bat"
call qt5_env.bat
call py3_env.bat
@echo off<span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"></span>
path %OSGEO4W_ROOT%\apps\qgis-dev\bin;%PATH%
set QGIS_PREFIX_PATH=%OSGEO4W_ROOT:\=/%/apps/qgis-dev
set GDAL_FILENAME_IS_UTF8=YES
rem Set VSI cache to be used as buffer, see #6448
set VSI_CACHE=TRUE
set VSI_CACHE_SIZE=1000000
set QT_PLUGIN_PATH=%OSGEO4W_ROOT%\apps\qgis-dev\qtplugins;%OSGEO4W_ROOT%\apps\qt5\plugins
set PYTHONPATH=%OSGEO4W_ROOT%\apps\qgis-dev\python;%PYTHONPATH%
start /d "C:\Program Files\JetBrains\PyCharm\bin\" pycharm64.exe

In PyCharm File | Settings, I configured the OSGeo4W Python 3.6 interpreter and added qgis-dev and the plugin folder to its path:

With this setup done, we can go back to the code.

I first resolved all occurrences of import * in my script to follow good coding practices. For example:

from qgis.core import *

became

from qgis.core import QgsFeature, QgsPoint, QgsVector, QgsGeometry, QgsField, QGis<span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"></span>

in this PR.

I didn’t even run the 2to3 script that is provided to make porting from Python 2 to Python 3 easier. Since the edge bundling code is mostly Numpy, there were almost no changes necessary. The only head scratching moment was when Numpy refused to add a map() return value to an array. So (with the help of Stackoverflow of course) I added a work around to convert the map() return value to an array as well:

flocal_x = map(forcecalcx, subtr_x, subtr_y, distance)
electrostaticforces_x[e_idx, :] += np.array(list(flocal_x))

The biggest change related to Processing is that the VectorWriter has been replaced by a QgsFeatureSink. It’s defined as a parameter of the edgebundling QgsProcessingAlgorithm:

self.addParameter(QgsProcessingParameterFeatureSink(
   self.OUTPUT,
   self.tr("Bundled edges"),
   QgsProcessing.TypeVectorLine)
)

And when the algorithm is run, the sink is filled with the output features:

(sink, dest_id) = self.parameterAsSink(
   parameters, self.OUTPUT, context,
   source.fields(), source.wkbType(), source.sourceCrs()
)

# code that creates features

sink.addFeature(feat, QgsFeatureSink.FastInsert)

The ported plugin is available on Github.

The edge bundling plugin in action

I haven’t uploaded the plugin to the official plugin repository yet, but you can already download if from Github and give it a try:

For this example, I’m using taxi pick-up and drop-off data provided by the NYC Taxi & Limousine Commission. I downloaded the January 2017 green taxi data and extracted all trips for the 1st of January. Then I created origin-destination (OD) lines using the QGIS virtual layer feature:

To get an interesting subset of the data, I extracted only those OD flows that cross the East River and have a count of at least 5 taxis:

Now the data is ready for bundling.

If you have installed the edge bundling plugin, the force-directed edge bundling algorithm should be available in the Processing toolbox. The UI of the edge bundling algorithm looks pretty much the same as it did for the QGIS 2 Processing script:

Since this is a small dataset with only 148 OD flows, the edge bundling processes is pretty quick and we can explore the results:

Beyond this core edge bundling algorithm, the repository also contains two more scripts that still need to be ported. They include dependencies on sklearn, so it will be interesting to see how straightforward it is to convert them.

Movement data in GIS #3: visualizing massive trajectory datasets

In the fist two parts of the Movement Data in GIS series, I discussed modeling trajectories as LinestringM features in PostGIS to overcome some common issues of movement data in GIS and presented a way to efficiently render speed changes along a trajectory in QGIS without having to split the trajectory into shorter segments.

While visualizing individual trajectories is important, the real challenge is trying to visualize massive trajectory datasets in a way that enables further analysis. The out-of-the-box functionality of GIS is painfully limited. Except for some transparency and heatmap approaches, there is not much that can be done to help interpret “hairballs” of trajectories. Luckily researchers in visual analytics have already put considerable effort into finding solutions for this visualization challenge. The approach I want to talk about today is by Andrienko, N., & Andrienko, G. (2011). Spatial generalization and aggregation of massive movement data. IEEE Transactions on visualization and computer graphics, 17(2), 205-219. and consists of the following main steps:

  1. Extracting characteristic points from the trajectories
  2. Grouping the extracted points by spatial proximity
  3. Computing group centroids and corresponding Voronoi cells
  4. Deviding trajectories into segments according to the Voronoi cells
  5. Counting transitions from one cell to another

The authors do a great job at describing the concepts and algorithms, which made it relatively straightforward to implement them in QGIS Processing. So far, I’ve implemented the basic logic but the paper contains further suggestions for improvements. This was also my first pyQGIS project that makes use of the measurement value support in the new geometry engine. The time information stored in the m-values is used to detect stop points, which – together with start, end, and turning points – make up the characteristic points of a trajectory.

The following animation illustrates the current state of the implementation: First the “hairball” of trajectories is rendered. Then we extract the characteristic points and group them by proximity. The big black dots are the resulting group centroids. From there, I skipped the Voronoi cells and directly counted transitions from “nearest to centroid A” to “nearest to centroid B”.

(data credits: GeoLife project)

From thousands of individual trajectories to a generalized representation of overall movement patterns (data credits: GeoLife project, map tiles: Stamen, map data: OSM)

The resulting visualization makes it possible to analyze flow strength as well as directionality. I have deliberately excluded all connections with a count below 10 transitions to reduce visual clutter. The cell size / distance between point groups – and therefore the level-of-detail – is one of the input parameters. In my example, I used a target cell size of approximately 2km. This setting results in connections which follow the major roads outside the city center very well. In the city center, where the road grid is tighter, trajectories on different roads mix and the connections are less clear.

Since trajectories in this dataset are not limited to car trips, it is expected to find additional movement that is not restricted to the road network. This is particularly noticeable in the dense area in the west where many slow trajectories – most likely from walking trips – are located. The paper also covers how to ensure that connections are limited to neighboring cells by densifying the trajectories before computing step 4.

trajectory_generalization

Running the scripts for over 18,000 trajectories requires patience. It would be worth evaluating if the first three steps can be run with only a subsample of the data without impacting the results in a negative way.

One thing I’m not satisfied with yet is the way to specify the target cell size. While it’s possible to measure ellipsoidal distances in meters using QgsDistanceArea (irrespective of the trajectory layer’s CRS), the initial regular grid used in step 2 in order to group the extracted points has to be specified in the trajectory layer’s CRS units – quite likely degrees. Instead, it may be best to transform everything into an equidistant projection before running any calculations.

It’s good to see that PyQGIS enables us to use the information encoded in PostGIS LinestringM features to perform spatio-temporal analysis. However, working with m or z values involves a lot of v2 geometry classes which work slightly differently than their v1 counterparts. It certainly takes some getting used to. This situation might get cleaned up as part of the QGIS 3 API refactoring effort. If you can, please support work on QGIS 3. Now is the time to shape the PyQGIS API for the following years!


Speeding up your PyQGIS scripts

I’ve recently spent some time optimising the performance of various QGIS plugins and algorithms, and I’ve noticed that there’s a few common performance traps which developers fall into when fetching features from a vector layer. In this post I’m going to explore these traps, what makes them slow, and how to avoid them.

As a bit of background, features are fetched from a vector layer in QGIS using a QgsFeatureRequest object. Common use is something like this:

request = QgsFeatureRequest()
for feature in vector_layer.getFeatures(request):
    # do something

This code would iterate over all the features in layer. Filtering the features is done by tweaking the QgsFeatureRequest, such as:

request = QgsFeatureRequest().setFilterFid(1001)
feature_1001 = next(vector_layer.getFeatures(request))

In this case calling getFeatures(request) just returns the single feature with an ID of 1001 (which is why we shortcut and use next(…) here instead of iterating over the results).

Now, here’s the trap: calling getFeatures is expensive. If you call it on a vector layer, QGIS will be required to setup an new connection to the data store (the layer provider), create some query to return data, and parse each result as it is returned from the provider. This can be slow, especially if you’re working with some type of remote layer, such as a PostGIS table over a VPN connection. This brings us to our first trap:

Trap #1: Minimise the calls to getFeatures()

A common task in PyQGIS code is to take a list of feature IDs and then request those features from the layer. A see a lot of older code which does this using something like:

for id in some_list_of_feature_ids:
    request = QgsFeatureRequest().setFilterFid(id)
    feature = next(vector_layer.getFeatures(request))
    # do something with the feature

Why is this a bad idea? Well, remember that every time you call getFeatures() QGIS needs to do a whole bunch of things before it can start giving you the matching features. In this case, the code is calling getFeatures() once for every feature ID in the list. So if the list had 100 features, that means QGIS is having to create a connection to the data source, set up and prepare a query to match a single feature, wait for the provider to process that, and then finally parse the single feature result. That’s a lot of wasted processing!

If the code is rewritten to take the call to getFeatures() outside of the loop, then the result is:

request = QgsFeatureRequest().setFilterFids(some_list_of_feature_ids)
for feature in vector_layer.getFeatures(request):
    # do something with the feature

Now there’s just a single call to getFeatures() here. QGIS optimises this request by using a single connection to the data source, preparing the query just once, and fetching the results in appropriately sized batches. The difference is huge, especially if you’re dealing with a large number of features.

Trap #2: Use QgsFeatureRequest filters appropriately

Here’s another common mistake I see in PyQGIS code. I often see this one when an author is trying to do something with all the selected features in a layer:

for feature in vector_layer.getFeatures():
    if not feature.id() in vector_layer.selectedFeaturesIds():
        continue

    # do something with the feature

What’s happening here is that the code is iterating over all the features in the layer, and then skipping over any which aren’t in the list of selected features. See the problem here? This code iterates over EVERY feature in the layer. If you’re layer has 10 million features, we are fetching every one of these from the data source, going through all the work of parsing it into a QGIS feature, and then promptly discarding it if it’s not in our list of selected features. It’s very inefficient, especially if fetching features is slow (such as when connecting to a remote database source).

Instead, this code should use the setFilterFids() method for QgsFeatureRequest:

request = QgsFeatureRequest().setFilterFids(vector_layer.selectedFeaturesIds())
for feature in vector_layer.getFeatures(request):
    # do something with the feature

Now, QGIS will only fetch features from the provider with matching feature IDs from the list. Instead of fetching and processing every feature in the layer, only the actual selected features will be fetched. It’s not uncommon to see operations which previously took many minutes (or hours!) drop down to a few seconds after applying this fix.

Another variant of this trap uses expressions to test the returned features:

filter_expression = QgsExpression('my_field &gt; 20')
for feature in vector_layer.getFeatures():
    if not filter_expression.evaluate(feature):
        continue

    # do something with the feature

Again, this code is fetching every single feature from the layer and then discarding it if it doesn’t match the “my_field > 20” filter expression. By rewriting this to:

request = QgsFeatureRequest().setFilterExpression('my_field &gt; 20')
for feature in vector_layer.getFeatures(request):
    # do something with the feature

we hand over the bulk of the filtering to the data source itself. Recent QGIS versions intelligently translate the filter into a format which can be applied directly at the provider, meaning that any relevant indexes and other optimisations can be applied by the provider itself. In this case the rewritten code means that ONLY the features matching the ‘my_field > 20’ criteria are fetched from the provider – there’s no time wasted messing around with features we don’t need.

 

Trap #3: Only request values you need

The last trap I often see is that more values are requested from the layer then are actually required. Let’s take the code:

my_sum = 0
for feature in vector_layer.getFeatures(request):
    my_sum += feature['value']

In this case there’s no way we can optimise the filters applied, since we need to process every feature in the layer. But – this code is still inefficient. By default QGIS will fetch all the details for a feature from the provider. This includes all attribute values and the feature’s geometry. That’s a lot of processing – QGIS needs to transform the values from their original format into a format usable by QGIS, and the feature’s geometry needs to be parsed from it’s original type and rebuilt as a QgsGeometry object. In our sample code above we aren’t doing anything with the geometry, and we are only using a single attribute from the layer. By calling setFlags( QgsFeatureRequest.NoGeometry ) and setSubsetOfAttributes() we can tell QGIS that we don’t need the geometry, and we only require a single attribute’s value:

my_sum = 0
request = QgsFeatureRequest().setFlags(QgsFeatureRequest.NoGeometry).setSubsetOfAttributes(['value'], vector_layer.fields() )
for feature in vector_layer.getFeatures(request):
    my_sum += feature['value']

None of the unnecessary geometry parsing will occur, and only the ‘value’ attribute will be fetched and populated in the features. This cuts down both on the processing required AND the amount of data transfer between the layer’s provider and QGIS. It’s a significant improvement if you’re dealing with larger layers.

Conclusion

Optimising your feature requests is one of the easiest ways to speed up your PyQGIS script! It’s worth spending some time looking over all your uses of getFeatures() to see whether you can cut down on what you’re requesting – the results can often be mind blowing!

QGIS 3 is underway – what does it mean for your plugins and scripts?

With the imminent release of QGIS 2.16, the development attention has now shifted to the next scheduled release – QGIS 3.0! If you haven’t been following the discussion surrounding this I’m going to try and summarise what exactly 3.0 means and how it will impact any scripts or plugins you’ve developed for QGIS.

qgis_icon.svgQGIS 3.0 is the first major QGIS release since 2.0 was released way back in September 2013. Since that release so much has changed in QGIS… a quick glance over the release notes for 2.14 shows that even for this single point release there’s been hundreds of changes. Despite this, for all 2.x releases the PyQGIS API has remained stable, and a plugin or script which was developed for use in QGIS 2.0 will still work in QGIS 2.16.

Version 3.0 will introduce the first PyQGIS API break since 2013. An API break like this is required to move QGIS to newer libraries such as Qt 5 and Python 3, and allows the development team the flexibility to tackle long-standing issues and limitations which cannot be fixed using the 2.x API. Unfortunately, the side effect of this API break is that the scripts and plugins which you use in QGIS 2.x will no longer work when QGIS 3.0 is released!

Numerous API breaking changes have already started to flow into QGIS, and 2.16 isn’t even yet publicly available. The best way to track these changes is to keep an eye on the “API changes” documentation.  This document describes all the changes which are flowing in which affect PyQGIS code, and describe how best they should be addressed by plugin and script maintainers. Some changes are quite trivial and easy to update code for, others are more extreme (such as changes surrounding moving to PyQt5 and Python 3) and may require significant time to adapt for.

I’d encourage all plugin and script developers to keep watching the API break documentation, and subscribe to the developers list for additional information about required changes as they are introduced.

If you’re looking for assistance or to outsource adaptation of your plugins and scripts to QGIS 3.0 – the team at North Road are ideally placed to assist! Our team includes some of the most experienced QGIS developers who are directly involved with the development of QGIS 3.0, so you can be confident knowing that your code is in good hands. Just contact us to discuss your QGIS development requirements.

You can read more about QGIS 3.0 API changes in The road to QGIS 3.0 – part 1.

The road to QGIS 3.0 – part 1

qgis_icon.svgAs we discussed in QGIS 3 is under way, the QGIS project is working toward the next major version of the application and these developments have major impact on any custom scripts or plugins you’ve developed for QGIS.

We’re now just over a week into this work, and already there’s been tons of API breaking changes landing the code base. In this post we’ll explore some of these changes, what’s motivated them, and what they mean for your scripts.

The best source for keeping track of these breaking changes is to watch the API break documentation on GitHub. This file is updated whenever a change lands which potentially breaks plugins/scripts, and will eventually become a low-level guide to porting plugins to QGIS 3.0.

API clean-ups

So far, lots of the changes which have landed have related to cleaning up the existing API. These include:

Removal of deprecated API calls

The API has been frozen since QGIS 2.0 was released in 2013, and in the years since then many things have changed. As a result, different parts of the API were deprecated along the way as newer, better ways of doing things were introduced. The deprecated code was left intact so that QGIS 2.x plugins would still all function correctly. By removing these older, deprecated code paths it enables the QGIS developers to streamline the code, remove hacky workarounds, untested methods, and just generally “clean things up”. As an example, the older labelling system which pre-dates QGIS 2.0 (it had no collision detection, no curved labels, no fancy data defined properties or rule based labelling!) was still floating around just in case someone tried to open a QGIS 1.8 project. That’s all gone now, culling over 5000 lines of outdated, unmaintained code. Chances are this won’t affect your plugins in the slightest. Other removals, like the removal of QgsMapRenderer (the renderer used before multi-threaded rendering was introduced) likely have a much larger impact, as many scripts and plugins were still using QgsMapRenderer classes and calls. These all need to be migrated to the new QgsMapRendererJob and QgsMapSettings classes.

Renaming things for consistency

Consistent naming helps keep the API predictable and more user friendly. Lots of changes have landed so far to make the naming of classes and methods more consistent. These include things like:

  • Making sure names use consistent capitalization. Eg, there was previously methods named “writeXML” and “writeXml”. These have all been renamed to consistently use camel case, including for acronyms. (In case you’re wondering – this convention is used to follow the Qt library conventions).
  • Consistent use of terms. The API previously used a mix of “CRS” and “SRS” for similar purposes – it now consistently uses “CRS” for a coordinate reference system.
  • Removal of abbreviations. Lots of abbreviated words have been removed from the names, eg “destCrs” has become “destinationCrs”. The API wasn’t consistently using the same abbreviations (ie “dest”/”dst”/”destination”), so it was decided to remove all use of abbreviated words and replace them with the full word. This helps keep things predictable, and is also a bit friendlier for non-native English speakers.

The naming changes all need to be addressed to make existing scripts and plugins compatible with QGIS 3.0. It’s potentially quite a lot of work for plugin developers, but in the long term it will make the API easier to use.

Changes to return and argument types

There’s also been lots of changes relating to the types of objects returned by functions, or the types of objects used as function arguments. Most of these involve changing the c++ types from pointers to references, or from references to copies. These changes are being made to strengthen the API and avoid potential crashes. In most cases they don’t have any affect on PyQGIS code, with some exceptions:

  • Don’t pass Python “None” objects as QgsCoordinateReferenceSystems or as QgsCoordinateTransforms. In QGIS 3.0 you must pass invalid QgsCoordinateReferenceSystem objects (“QgsCoordinateReferenceSystem()”) or invalid QgsCoordinateTransform (“QgsCoordinateTransform()”) objects instead.

Transparent caching of CRS creation

The existing QgsCRSCache class has been removed. This class was used to cache the expensive results of initializing a QgsCoordinateReferenceSystem object, so that creating the same CRS could be done instantly and avoid slow databases lookups. In QGIS 3.0 this caching is now handled transparently, so there is no longer a need for the separate QgsCRSCache and it has been removed. If you were using QgsCRSCache in your PyQGIS code, it will need to be removed and replaced with the standard QgsCoordinateReferenceSystem constructors.

This change has the benefit that many existing plugins which were not explicitly using QgsCRSCache will now gain the benefits of the faster caching mechanism – potentially this could dramatically speed up existing plugin algorithms.

In summary

The QGIS developers have been busy fixing, improving and cleaning up the PyQGIS API. We recognise that these changes result in significant work for plugin and script developers, so we’re committed to providing quality documentation for how to adapt your code for these changes, and we will also investigate the use of automated tools to help ease your code transition to QGIS 3.0. We aren’t making changes lightly, but instead are carefully refining the API to make it more predictable, streamlined and stable.

If you’d like assistance with (or to outsource) the transition of your existing QGIS scripts and plugins to QGIS 3.0, just contact us at North Road to discuss. Every day we’re directly involved in the changes moving to QGIS 3.0, so we’re ideally placed to make this transition painless for you!

Open source IDF router for QGIS

This is a follow-up on my previous post introducing an Open source IDF parser for QGIS. Today’s post takes the code further and adds routing functionality for foot, bike, and car routes including oneway streets and turn restrictions.

You can find the script in my QGIS-resources repository on Github. It creates an IDFRouter object based on an IDF file which you can use to compute routes.

The following screenshot shows an example car route in Vienna which gets quite complex due to driving restrictions. The dark blue line is computed by my script on GIP data while the light blue line is the route from OpenRouteService.org (via the OSM route plugin) on OSM data. Minor route geometry differences are due to slight differences in the network link geometries.

Screenshot 2015-08-01 16.29.57


  • Page 1 of 2 ( 32 posts )
  • >>
  • pyqgis

Back to Top

Sustaining Members