Principal component analysis (PCA) is a common data reduction technique that is used to reduce the dimensionality of multi-dimensional space. In the field of remote sensing, PCA is often used to reduce the number of bands of multi-spectral, or hyper-spectral, imagery. Image correlation analysis often reveals a substantial level of correlation among bands of multi-spectral imagery. This correlation represents data redundancy, i.e. fewer images than the number of bands are required to represent the same information, where the information is related to variation within the imagery. PCA transforms the original data set of n bands into n 'component' images, where each component image is uncorrelated with all other components. The technique works by transforming the axes of the multi-spectral space such that it coincides with the directions of greatest correlation. Each of these new axes are orthogonal to one another, i.e. they are at right angles. PCA is therefore a type of coordinate system transformation. The PCA component images are arranged such that the greatest amount of variance (or information) within the original data set, is contained within the first component and the amount of variance decreases with each component. It is often the case that the majority of the information contained in a multi-spectral data set can be represented by the first three or four PCA components. The higher-order components are often associated with noise in the original data set.

The user must specify the names of the multiple input images (inputs). Additionally, the user must specify whether to perform a standardized PCA (standardized) and the number of output components (num_comp) to generate (all components will be output unless otherwise specified). A standardized PCA is performed using the correlation matrix rather than the variance-covariance matrix. This is appropriate when the variances in the input images differ substantially, such as would be the case if they contained values that were recorded in different units (e.g. feet and meters) or on different scales (e.g. 8-bit vs. 16 bit).

Several outputs will be generated when the tool has completed. The PCA report will be embedded within an output (output) HTML file, which should be automatically displayed after the tool has completed. This report contains useful data summarizing the results of the PCA, including the explained variances of each factor, the Eigenvalues and Eigenvectors associated with factors, the factor loadings, and a scree plot. The first table that is in the PCA report lists the amount of explained variance (in non-cumulative and cumulative form), the Eigenvalue, and the Eigenvector for each component. Each of the PCA components refer to the newly created, transformed images that are created by running the tool. The amount of explained variance associated with each component can be thought of as a measure of how much information content within the original multi-spectral data set that a component has. The higher this value is, the more important the component is. This same information is presented in graphical form in the scree plot, found at the bottom of the PCA report. The Eigenvalue is another measure of the information content of a component and the eigenvector describes the mathematical transformation (rotation coordinates) that correspond to a particular component image.

Factor loadings are also output in a table within the PCA text report (second table). These loading values describe the correlation (i.e. r values) between each of the PCA components (columns) and the original images (rows). These values show you how the information contained in an image is spread among the components. An analysis of factor loadings can be reveal useful information about the data set. For example, it can help to identify groups of similar images.

PCA is used to reduce the number of band images necessary for classification (i.e. as a data reduction technique), for noise reduction, and for change detection applications. When used as a change detection technique, the major PCA components tend to be associated with stable elements of the data set while variance due to land-cover change tend to manifest in the high-order, 'change components'. When used as a noise reduction technique, an inverse PCA is generally performed, leaving out one or more of the high-order PCA components, which account for noise variance.

Note: the current implementation reads every raster into memory at one time. This is because of the calculation of the co-variances. As such, if the entire image stack cannot fit in memory, the tool will likely experience an out-of-memory error. This tool should be run using the wd flag to specify the working directory into which the component images will be written.

Function Signature

def principal_component_analysis(self, rasters: List[Raster], output_html_file: str, num_components: int = 2, standardized: bool = False) -> List[Raster]: ...

Project Links

WbW Homepage User Manual Support WbW