Today I wanted to batch convert a directory of .tiff images to .ecw
(MrSid wavelet compressed). Our server has 8 cores so it would be nice
to use them all right?
Here is the quick & dirty way I do this kind of job in parallel.
#!/bin/bash
mkdir ecw
for FILE in *.tif
do
BASENAME=$(basename $FILE .tif)
OUTFILE=ecw/${BASENAME}.ecw
echo "Processing: ${BASENAME}.tif"
if [ -f $OUTFILE ] #skip if exists
then
echo "Skipping: $OUTFILE"
else
/usr/local/bin/gdal_translate -of ECW -co LARGE_OK=YES $FILE $OUTFILE
fi
done
The script is extremely simple and is set up so that you can run it
multiple times without problems because if looks to see if the output
file already exists before trying to write it. If it does exist, it
skips straight on to the next image.
To run 8 parallel processes I simply do this at the command prompt (I
did mine in a screen session):
./toecw &
./toecw &
./toecw &
./toecw &
./toecw &
./toecw &
./toecw &
./toecw &
Afterwards you can fire up top and watch 'em go!
top - 18:21:04 up 6:41, 4 users, load average: 10.29, 9.83, 6.69
Tasks: 216 total, 1 running, 215 sleeping, 0 stopped, 0 zombie
Cpu0 : 56.5%us, 22.5%sy, 0.0%ni, 15.7%id, 4.9%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu1 : 53.3%us, 31.6%sy, 0.0%ni, 8.9%id, 6.2%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 50.7%us, 37.5%sy, 0.0%ni, 4.9%id, 6.6%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 46.6%us, 38.4%sy, 0.0%ni, 4.9%id, 9.8%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 44.0%us, 29.8%sy, 0.0%ni, 8.7%id, 17.2%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu5 : 30.7%us, 57.4%sy, 0.0%ni, 1.7%id, 9.6%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu6 : 58.3%us, 23.8%sy, 0.0%ni, 9.4%id, 8.5%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 46.1%us, 38.6%sy, 0.0%ni, 10.1%id, 4.6%wa, 0.0%hi, 0.7%si, 0.0%st
Mem: 16227956k total, 16144508k used, 83448k free, 1739140k buffers
Swap: 62492832k total, 0k used, 62492832k free, 13383020k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12717 timlinux 18 -2 197m 85m 5384 D 104 0.5 0:55.49 gdal_translate
12536 timlinux 18 -2 171m 77m 5384 S 102 0.5 1:08.95 gdal_translate
12705 timlinux 18 -2 195m 65m 5384 D 100 0.4 0:52.58 gdal_translate
12737 timlinux 18 -2 194m 64m 5384 D 97 0.4 0:40.78 gdal_translate
12549 timlinux 18 -2 195m 103m 5384 S 95 0.7 1:12.68 gdal_translate
12751 timlinux 18 -2 165m 66m 5384 S 88 0.4 0:37.46 gdal_translate
12561 timlinux 18 -2 166m 67m 5384 D 69 0.4 1:03.91 gdal_translate
12528 timlinux 18 -2 164m 65m 5384 S 16 0.4 0:18.24 gdal_translate
One thing to note - I ran this with the data sitting on a storage array
- if your data all lives on a single drive you may have serious IO
issues doing the above....