This website describes the subplugin of the Processing Toolbox and its functionality.
Cluster Points offers a set of cluster tools for an automatical grouping of a point layer in Quantum GIS minimizing the intra-group distance and maximizing the between-group distance: There are two inherently different algorithms the user may choose from. First, there is K-means clustering which randomly initializes the cluster centers and reassigns cluster members until the centers stop moving. Second, there is agglomerative hierarchical clustering which starts with as many clusters as there are points and gradually merges individual clusters according to a certain link function.
Cluster Points works with input shapefiles with a new field Cluster_ID appended to the attribute table as output.
Cluster Points is a free software and offered without guarantee or warranty. You can redistribute it and/or modify it under the terms of version 3 of the GNU General Public License as published by the Free Software Foundation. Bug reports or suggestions are welcome at the e-mail address above.
Note that some code segments have been taken from the built-in ftools and the MMQGIS plugin. This plugin was started during the project phase of a GIS-Analyst training course in Berlin (GIS-Trainer). I acknowledge the assistance of the GIS-Trainer tutors and my classmates Juliane, Bennet and Sebastian.
The Clustering Tool offers spatial clustering of a point layer based on the mutual distances between points. Basically, the inter-class distances are maximized whereas the intra-class distances are minimized. The user always needs to define the number of clusters which is sought (minimum is 2). Also, the user needs to decide between the Euclidean distance and the Manhattan distance within the cluster computation. Since the K-means algorithm is based on a random initialization, a random seed can be specified for this type of clustering to guarantee stable results. For hierachical clustering a linkage, i.e. link function, must be specified which determines where the distances are measured between individual clusters.
Two inherently different clustering types are available:
The output always is a new field/attribute labelled Cluster ID appended to the input shapefile to indicate cluster membership of individual points.
To illustrate the functionality of the plugin, we have a look at some kind of customer addresses of a certain company in Berlin (polygons delineate Berlin districts). Let's assume the company wants to install 8 new logistic centers across the city and needs to know about the optimum locations to minimize distances between individual logistic centers and customers nearby (875 altogether).
To find the optimum locations, the Clustering is run to find group of customers who are close to each other. The user sets 8 as the target number of clusters and runs the algorithm (in this case the K-means). The output is a point layer with the same number of points as the input, but with the new field Cluster ID appended to the attribute table and the cluster members displayed in color according to their Cluster ID.