Note this tool is part of a WhiteboxTools extension product. Please visit Whitebox Geospatial Inc. for information about purchasing a license activation key (https://www.whiteboxgeo.com/extension-pricing/).
This tool applies a pre-built random forest (RF) regression model trained using multiple predictor rasters, or features (input_rasters
), and training data to predict a spatial distribution. This function is part of a set of two tools, including random_forest_regression_fit and random_forest_regression_prdict. The random_forest_regression_fit function should be used first to create the RF model and the random_forest_regression_predict can then be used to apply that model for prediction. The output of the fit tool is a byte array that is a binary representation of the RF model. This model can then be used as the input to the predict tool, along with a list of input raster predictors, which must be in the same order as those used in the fit tool (see below). The output of the predict tool is a raster. The reason that the RF workflow is split in this way is that often it is the case that you need to experiment with various input predictor sets and parameter values to create an adequate model. There is no need to generate an output classified raster during this experimentation stage, and because prediction can often be the slowest part of the RF modelling process, it is generally only performed after the final model has been identified. The binary representation of the RF-based model can be serialized (i.e., saved to a file) and then later read back into memory to serve as the input for the prediction step of the workflow (see code example below).
Note: it is very important that the order of feature rasters is the same for both fitting the model and using the model for prediction. It is possible to use a model fitted to one data set to make preditions for another data set, however, the set of feature reasters specified to the prediction tool must be input in the same sequence used for building the model. For example, one may train a RF classifer on one set of multi-spectral satellite imagery and then apply that model to classify a different imagery scene, but the image band sequence must be the same for the Fit/Predict tools otherwise inaccurate predictions will result.
import os from whitebox_workflows import WbEnvironment
license_id = 'floating-license-id' wbe = WbEnvironment(license_id)
try: wbe.verbose = True wbe.working_directory = "/path/to/data"
# Read the input raster files into memory images = wbe.read_rasters( 'DEV.tif', 'profile_curv.tif', 'tan_curv.tif', 'slope.tif' ) # Read the input training polygons into memory training_data = wbe.read_vector('Ottawa_soils_data.shp') # Train the model model = wbe.random_forest_regression_fit( images, training_data, field_name = 'Sand', n_trees = 50, min_samples_leaf = 1, min_samples_split = 2, test_proportion = 0.2 ) # Example of how to serialize the model, i.e., save the model, which is just binary data print('Saving the model to file...') file_path = os.path.join(wbe.working_directory, "rf_model.bin") with open(file_path, "wb") as file: file.write(bytearray(model)) # Example of how to deserialize the model, i.e. read the model model = [] with open(file_path, mode='rb') as file: model = list(file.read()) # Use the model to predict rf_image = wbe.random_forest_regression_predict(images, model) wbe.write_raster(rf_image, 'rf_regression.tif', compress=True) print('All done!')
except Exception as e: print("The error raised is: ", e) finally: wbe.check_in_license(license_id)
random_forest_regression_fit, random_forest_classification_fit, random_forest_classification_predict, knn_classification, svm_classification, parallelepiped_classification, evaluate_training_sites
def random_forest_regression_predict(self, input_rasters: List[Raster], model_bytes: List[int]) -> Raster: ...