Related Plugins and Tags

QGIS Planet

Installing python GDAL into a python virtualenv in OSX

Yes I know its a bit wierd to see the word ‘OSX’ in an article by me…I recently bought a Mac for testing InaSAFE and fooling around with FOSSGIS software on OSX. Besides the 2 or three days it took to get it set up as a development machine for FOSSGIS stuff (as compared to... Read more »

Installing python GDAL into a python virtualenv in OSX

Yes I know its a bit wierd to see the word 'OSX' in an article by me...I recently bought a Mac for testing InaSAFE and fooling around with FOSSGIS software on OSX. Besides the 2 or three days it took to get it set up as a development machine for FOSSGIS stuff (as compared to a few hours - limited mainly by internet speed - on Linux) it hasnt been too bad an experience.

I used William Kyngesburye's excellent packages for installing all the FOSSGIS dependencies (barring QGIS which of course I wanted to build myself). I've been trying to see if I can use OSX as a usable platform for doing GeoDjango work and ran into a few problems:

  • GDAL python package needs compilation when used in a python virtualenv (which of course I use)
  • William's packages provide PostGIS 2.0 whereas GeoDjango really only works out of the box with 1.5 still

In this post I am outlining how I got GDAL to build in my venv. Note that I assume you have Williams frameworks and xcode / compilation toolchain in place.

source venv/bin/activate
pip install --no-install GDAL
cd venv/build/GDAL
python setup.py build_ext \
  --gdal-config=/usr/local/gdal/1.9/bin/gdal-config \
  --library-dirs=/Library/Frameworks/GDAL.framework/Versions/1.9/unix/lib/ \
  --include-dirs=/Library/Frameworks/GDAL.framework/Versions/1.9/Headers/
pip install --no-download GDAL

Now verify it is installed:

pip freeze

You should see GDAL listed. Another check:

>>> from osgeo import gdal
>>> gdal.__version__
'1.9.2'

Confirm it is really using your module in your venv:

>>> gdal.__file__
'/Users/timlinux/dev/python/catalogue/venv/lib/python2.7/site-packages/osgeo/gdal.pyc'

Creating coloured rasters with GDAL

One of the most popular posts on my blog ever is my article about creating GDAL hillshades: ‘A workflow for creating beautiful relief shaded DEMS using GDAL‘. In the techniques outlined in the aforementioned article, colours are assigned across a continuum to and superimposed on top of a hillshaded model. The GDAL tools in QGIS [...]

Selecting GDAL Drivers on the fly with QGIS

There is quite an old ticket in the QGIS bug tracker relating to the need to be able to select which GDAL drivers to use. The issues is this: GDAL in some cases provides multiple drivers for the same image type. For example, JPEG2000 datasets can be opened using ECW's proprietary driver, KAKADU (gotta love that name!), OpenJPEG etc. The problem is that there isn't actually any way to choose which driver GDAL should use for a given image format - it works on a first come, first served bases. This is a bit of a problem sometimes when you actually want to be using a different driver. In some work I did recently for SANSA (the South African Space Agency), I built a custom version of QGIS for them. They are particularly keen on the JPEG2000 format since it is an open standard and provides good compression. However the custom build of QGIS I made for them includes the ECW JP2000 driver which unfortunately chokes and dies (no fault of GDAL, it happens down in the ECW code itself) when trying to open their JP2 imagery. Also we would prefer to use a FOSS driver for JP2 in keeping with the general ethos of being a FOSS project. So all this prompted me to implement the needed support to let you disable GDAL drivers at runtime in QGIS which is illustrated a little by the screenshots below.

 

By default GDAL want to use the ECW JPG2000 driver...(click to enlarge)

By default GDAL want to use the ECW JPG2000 driver...(click to enlarge)

Using the options dialog, you can now disable a driver....(click to enlarge)

Using the options dialog, you can now disable a driver....(click to enlarge)

After the driver is disabled (and restarting QGIS), the alternate JP2 driver is available (click to enlarge)

After the driver is disabled (and restarting QGIS), the alternate JP2 driver is available (click to enlarge)

pixelstats trackingpixel

Improvements to raster performance in QGIS master

Today I committed changes to QGIS master that I started working on at the Lisbon hackfest and finalised this week (along with a unit test yay!). The changes implement improvements to the way in which statistics are gathered for rasters. First a little history: the raster stats code in QGIS is some of the earliest work I have done in QGIS. The code was also hugely inefficient - basically it completely scans every pixel in the file twice! It does this because it extracts minima, maxima, stddev etc. from each band in the raster - and the algorithm used to calculate the stddev requires first calculating the sum of means, hence the second pass. Of course we use the underlying GDAL library to traverse the raster, but we completly bypassed the GDAL stats gathering functions. Now from my testing (see below) the difference of gathering stats in QGIS as opposed to directly in GDAL was not that great. However, the real killer is the fact that QGIS had no reasonable implementation to cache these statistics - either within a session or between sessions. This resulted in quite a lot of  complaints about the slow loading of rasters in QGIS and the spontaneous recollection of stats when making various changes to raster symbology etc.

With the release of 1.7, Radim Blazek implemented 'proper' support for raster providers so that they follow a similar model to the vector providers in QGIS. This opens up room for provider specific optitmisations and for implementing new providers in the future e.g. for PostGIS Raster, Terralib etc. At the Lisbon hackfest, I refactored the provider interface so that it provides a default implementation of stats gathering which can be overloaded by specific providers, each then being able to use best practice for that datasource when gathering stats. I then updated the GDAL Provider to use gdal native calls to fetch statistics.

When GDAl gathers stats, it stores them in a file named after the source raster but with a .aux.xml extension. As a bonus, GDAL also stores histogram data in this file. Thanks to some excellent tips from Etienne (Sky?) on the QGIS mailing list my implementation was further improved so that it actually uses these cached stats from GDAL properly. So let's see how performance is improved with these changes. I tested using a ~650mb tif hillshaded DEM from one of my clients.

 

Raster Load Performance in QGIS

 Added bonus: My changes also use a more efficient approach for obtaining min/max for a band (before all stats are gathered) and also histgram calculations are cached now. So after the initial gathering of stats and histograms, all activities when working with GDAL rasters should be near instant now. By the way the 1 in the graph above is rounded up - in my usage it was near instant in most cases.

I need to still evaluate whether it will be good to backport these changes to the 1.7 stable tree, so for now if you want to benefit from these changes, you need to be using a nightly build or build from source.

Every 6 months we hold a hackfest / QGIS meeting and many of our users generously donate towards it. I hope the above article gives those who do support our meetings a sense of the tangible benefits that arise from these meetings - there have been many other improvements coming into master that arose from the last hackfest that we all get to enjoy! If you would like to help towards the costs of the next hackfest, visit http://qgis.org and click the 'Donate!' button on the left hand side. Have fun!

 

pixelstats trackingpixel

Improvements to raster performance in QGIS master

Today I committed changes to QGIS master that I started working on at the Lisbon hackfest and finalised this week (along with a unit test yay!). The changes implement improvements to the way in which statistics are gathered for rasters. First a little history: the raster stats code in QGIS is some of the earliest work I have done in QGIS. The code was also hugely inefficient - basically it completely scans every pixel in the file twice! It does this because it extracts minima, maxima, stddev etc. from each band in the raster - and the algorithm used to calculate the stddev requires first calculating the sum of means, hence the second pass. Of course we use the underlying GDAL library to traverse the raster, but we completly bypassed the GDAL stats gathering functions. Now from my testing (see below) the difference of gathering stats in QGIS as opposed to directly in GDAL was not that great. However, the real killer is the fact that QGIS had no reasonable implementation to cache these statistics - either within a session or between sessions. This resulted in quite a lot of  complaints about the slow loading of rasters in QGIS and the spontaneous recollection of stats when making various changes to raster symbology etc.

With the release of 1.7, Radim Blazek implemented 'proper' support for raster providers so that they follow a similar model to the vector providers in QGIS. This opens up room for provider specific optitmisations and for implementing new providers in the future e.g. for PostGIS Raster, Terralib etc. At the Lisbon hackfest, I refactored the provider interface so that it provides a default implementation of stats gathering which can be overloaded by specific providers, each then being able to use best practice for that datasource when gathering stats. I then updated the GDAL Provider to use gdal native calls to fetch statistics.

When GDAl gathers stats, it stores them in a file named after the source raster but with a .aux.xml extension. As a bonus, GDAL also stores histogram data in this file. Thanks to some excellent tips from Etienne (Sky?) on the QGIS mailing list my implementation was further improved so that it actually uses these cached stats from GDAL properly. So let's see how performance is improved with these changes. I tested using a ~650mb tif hillshaded DEM from one of my clients.

Raster Load Performance in QGIS

 Added bonus: My changes also use a more efficient approach for obtaining min/max for a band (before all stats are gathered) and also histgram calculations are cached now. So after the initial gathering of stats and histograms, all activities when working with GDAL rasters should be near instant now. By the way the 1 in the graph above is rounded up - in my usage it was near instant in most cases.

I need to still evaluate whether it will be good to backport these changes to the 1.7 stable tree, so for now if you want to benefit from these changes, you need to be using a nightly build or build from source.

Every 6 months we hold a hackfest / QGIS meeting and many of our users generously donate towards it. I hope the above article gives those who do support our meetings a sense of the tangible benefits that arise from these meetings - there have been many other improvements coming into master that arose from the last hackfest that we all get to enjoy! If you would like to help towards the costs of the next hackfest, visit http://qgis.org and click the 'Donate!' button on the left hand side. Have fun!

Frank Warmerdam to become a Google guy

If you follow the osgeo planet blog aggregator, you may have noticed Frank Warmerdam's recent blog post mentioning that he is off to work for Google. If I think back to the blog posts I have written, GDAL features in a great many of them - its a mainstay tool for me and barely a day goes buy that I don't gdal_translate some file or what-have-you. Hell, I just finished running a batch job that ran for about 3 weeks comprised of nothing more than bash script and GDAL commands.

In QGIS too, GDAL is one of the key underpinnings of our project - it is only now 8 years or so into the project that we are starting to abstract the use of GDAL for reading rasters away a little to make space of other raster provider types. And of course OGR is one of the main ways that our users interact with vector data from within QGIS.

Personally, Frank has also been extremely helpful whenever I have wondered into the #gdal IRC channel looking for help. Reading his blog post made me think that Google doesn't quite realise yet just how lucky they are to get him. If Google has any heart, they will make Frank a 'minister without portfolio' and just tell him to keep working on GDAL and enjoy the free food in their cafetarias. Here's hoping at least that he still gets to spend a little time on the software that so many of use love using and depend on.

pixelstats trackingpixel

Batch convert a directory of tiffs to ecw

Today I wanted to batch convert a directory of .tiff images to .ecw (MrSid wavelet compressed). Our server has 8 cores so it would be nice to use them all right?

Here is the quick & dirty way I do this kind of job in parallel.

#!/bin/bash
mkdir ecw
for FILE in *.tif
do
  BASENAME=$(basename $FILE .tif)
  OUTFILE=ecw/${BASENAME}.ecw
  echo "Processing: ${BASENAME}.tif"
  if [ -f $OUTFILE ] #skip if exists
  then
    echo "Skipping: $OUTFILE"
  else
    /usr/local/bin/gdal_translate -of ECW -co LARGE_OK=YES $FILE $OUTFILE
  fi
done

The script is extremely simple and is set up so that you can run it multiple times without problems because if looks to see if the output file already exists before trying to write it. If it does exist, it skips straight on to the next image.

To run 8 parallel processes I simply do this at the command prompt (I did mine in a screen session):

./toecw &
./toecw &
./toecw &
./toecw &
./toecw &
./toecw &
./toecw &
./toecw &

Afterwards you can fire up top and watch 'em go!

top - 18:21:04 up  6:41,  4 users,  load average: 10.29, 9.83, 6.69
Tasks: 216 total,   1 running, 215 sleeping,   0 stopped,   0 zombie
Cpu0  : 56.5%us, 22.5%sy,  0.0%ni, 15.7%id,  4.9%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu1  : 53.3%us, 31.6%sy,  0.0%ni,  8.9%id,  6.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 50.7%us, 37.5%sy,  0.0%ni,  4.9%id,  6.6%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  : 46.6%us, 38.4%sy,  0.0%ni,  4.9%id,  9.8%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu4  : 44.0%us, 29.8%sy,  0.0%ni,  8.7%id, 17.2%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu5  : 30.7%us, 57.4%sy,  0.0%ni,  1.7%id,  9.6%wa,  0.0%hi,  0.7%si,  0.0%st
Cpu6  : 58.3%us, 23.8%sy,  0.0%ni,  9.4%id,  8.5%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  : 46.1%us, 38.6%sy,  0.0%ni, 10.1%id,  4.6%wa,  0.0%hi,  0.7%si,  0.0%st
Mem:  16227956k total, 16144508k used,    83448k free,  1739140k buffers
Swap: 62492832k total,        0k used, 62492832k free, 13383020k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12717 timlinux  18  -2  197m  85m 5384 D  104  0.5   0:55.49 gdal_translate
12536 timlinux  18  -2  171m  77m 5384 S  102  0.5   1:08.95 gdal_translate
12705 timlinux  18  -2  195m  65m 5384 D  100  0.4   0:52.58 gdal_translate
12737 timlinux  18  -2  194m  64m 5384 D   97  0.4   0:40.78 gdal_translate
12549 timlinux  18  -2  195m 103m 5384 S   95  0.7   1:12.68 gdal_translate
12751 timlinux  18  -2  165m  66m 5384 S   88  0.4   0:37.46 gdal_translate
12561 timlinux  18  -2  166m  67m 5384 D   69  0.4   1:03.91 gdal_translate
12528 timlinux  18  -2  164m  65m 5384 S   16  0.4   0:18.24 gdal_translate

One thing to note - I ran this with the data sitting on a storage array - if your data all lives on a single drive you may have serious IO issues doing the above....

Extracting bands from hdf4_eos to tif

Here is a quick and dirty bash script I wrote to extract all bands from a modis image and save each to a tiff file. The saved bands are named after the sub layer in the original hdf4 file. The script actually generates a second script called batch.sh so you can peek in there to see the undelying gdal_translate commands that are generated.

One day I need to get around to letting QGIS natively support HDF - maybe a good topic for those attending the OSGEO hackfest in NY?

#!/bin/bash
#Set field separator to linefeeds rather than spaces
echo "#!/bin/bash" > batch.sh
chmod +x batch.sh
IFS=$'\n'
for FILE in *.hdf
do
  echo $FILE
  LAYERS=$(gdalinfo $FILE | grep SUBDATASET | grep NAME  | sed 's/[A-Z0-9\_]*=//g')
  for LAYER in $LAYERS
  do
    LAYER=$(echo $LAYER | sed 's/  //g')
    NEWFILE=$(echo $LAYER | \
    sed 's/  //g' | sed 's/"//g' | sed 's/ //g'| sed 's/:/./g' ).tif
    CMD="gdal_translate -of GTiff '${LAYER}' $NEWFILE"
    echo $CMD >> batch.sh
  done
done

exec ./batch.sh

Back to Top

Sustaining Members