Predicting soil aggregate stability using readily available soil properties and machine learning techniques

Predicting soil aggregate stability using readily available soil properties and machine learning techniques
Autores Javier I. Rivera; Carlos A. Bonilla
Línea Critical Resources
Año publicación 2020
Revista Catena
Palabras clave
Aggregate stability, Machine learning, Neural networks, Soil aggregates, Water-stable aggregates
Resumen Aggregate stability is a measurement of soil quality, as the presence of stable aggregates relates to a wide range of soil ecosystem services. However, aggregates stability is not reported in most soil surveys, so predictive models have been focus of increasing attention as an alternative method in the absence of direct measurements. Therefore, the objective of this study was to develop a new model for predicting aggregate stability, using two machine learning techniques: An Artificial Neural Network (ANN) model and Generalized Linear Model (GLM). These techniques were applied to a soil dataset described in terms of soil texture, organic matter content, pH, and water-stable aggregates. This dataset included 109 soil samples obtained at 0–17 cm soil depth from hyperarid, arid, semiarid, and humid regions in Chile, including agricultural soils, shrubland, and forestland. Most soil textures in this dataset were sandy loam, loam, and clay loam, and each soil property had a large range of values. Aggregate stability was measured and computed as the percentage of water-stable aggregates using a wet sieving apparatus, and the ANN and GLM models were constructed and evaluated by repeated cross-validation (80% and 20% of dataset for training and testing, respectively). The ANN and GLM models were compared by computing the modified r2 (radj2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The results demonstrated a positive gradient of aggregate stability from arid (40% in average) to humid (87% in average) regions, which is related to the increase in organic matter content and decrease in pH. Organic matter content and pH exhibited a significant correlation to the aggregate stability, with r = 0.56 and r = −0.73, respectively. Moreover, among the fractions used to compute the soil texture, the clay content exhibited the highest correlation with aggregate stability (r = 0.30). These variables were used for training and testing the ANN and GLM models. The ANN model achieved superior performance in terms of the RMSE, radj2 and MAE in the cross-validation procedure, and showed r2 = 0.80 for training and r2 = 0.82 for testing. The GLM yielded r2 = 0.59 and r2 = 0.63 for training and testing, respectively. Therefore, despite the limitations observed when implementing ANN, its use is recommended instead of GLM as a reference model. Considering the small number of easily measured variables, this study provides two models that can be coupled with other existing soil routines or can be used directly to complete soil surveys where the aggregate stability was not measured.
Autor principal Carlos A. Bonilla