Background

Understanding which regions QGIS is being used in, which versions are in active use, which platforms it is being used on, and how many users we have is hugely beneficial to our ability as a project to serve our users. Back in 2017 at the bi-annual QGIS hackfest in Nødebo, Denmark, we had a long discussion about key project goals and the need to better understand our user base in order to plan the future direction of the project, and allocate funding and resources to where they are needed most

Typically proprietary software vendors have ready access to detailed user data through telemetry code which they embed in their software. This telemetry code ‘phones home’ key metrics, which together with other techniques such as license sales analysis gives them a very detailed insight into their user base. The data these vendors collect is typically not shared, so their users do not benefit from being able to understand how their data is used.

For QGIS.org, having to resort to what are generally considered to be nefarious and privacy-invading techniques of siphoning user data from our users goes against the ethos we try to promote as an open project. Further, since QGIS is freely available and doesn’t require any self-registration, we do not have a user database we can consult for such analytics. Additional factors make understanding usage levels hard. For example, a single user can download a copy of a QGIS installer and distribute it to many other users, and conversely web crawlers and bots can download many copies of QGIS installers and never install them. Because of this, simply counting the number of downloads from our website does not give a useful picture of our user base.

So we needed to come up with an approach that:

  1. Does not invade our user’s privacy
  2. Does not require including telemetry code in QGIS which exfiltrates user information from their system
  3. Does not store any user-identifiable data on our servers
  4. Is open and transparent in the data collection methodology
  5. Openly shares the insights we gain from our analytics to the broader community

The most obvious privacy-respecting way we could find to understand more about our users was to collect metrics of access to the QGIS News Feed. In order to display the latest news on startup, QGIS Desktop makes a request to https://feed.qgis.org when it is opened. On the server that hosts the feed, we can then use the web server logs to understand which operating system and version of QGIS made the news feed request. Additionally, using the GeoIP library we can resolve each request to the country from which it originated. These pieces of information are included in the User-Agent headers sent by QGIS when it makes a request to the QGIS News Feed.

This process is anonymous, transparent, and simple to disable. It does not identify unique machines. Only one event is logged per unique network per hour. Only one event is logged per QGIS installation per day, and the event is only triggered when the user opens the QGIS Desktop application.

Operating system statistics are derived from QGIS version information, and no system fingerprinting or telemetry is implemented.

Location information is derived from the request source IP address, which is immediately discarded on the server after resolving it to the country of origin.

No logging on the QGIS News Feed server occurs with legacy installations that do not have the news feed feature, offline usage of QGIS, and installations for which feed collection is disabled (see below for info on how to disable it). It will also have statistics skewed in scenarios where atypical networking infrastructure is in effect, such as using a virtual private network.

Despite these caveats, the statistics should provide a good high-level overview of how QGIS is being used, such as the breakdown of QGIS across operating systems and versions – information that is incredibly useful to the QGIS developer team. Only the following four pieces of information are collected:

  • The date (aggregated by day)
  • The QGIS version
  • The Operating System
  • Country (based on IP which is immediately discarded)

Opting out

If you wish to opt-out of this data collection, simply disabling the feed retrieval, using QGIS offline, or blocking access to the QGIS RSS feed address (feed.qgis.org) on your network will exclude you from this process. QGIS Desktop provides options for disabling version checking and feed access under Settings ➔ Options ➔ General ➔ Application. Note that by default this setting is specific to each individual user profile.

Viewing the analytics

We have made a public dashboard publicly available at https://analytics.qgis.org. The dashboard was made using the fantastic open-source Metabase analytics package.

Credits: This post was written by Charles Dixon-Paver and Tim Sutton