Heavy metals concentration in soils across the conterminous USA: Spatial prediction, model uncertainty, and influencing factors

Sci Total Environ. 2024 Apr 1:919:170972. doi: 10.1016/j.scitotenv.2024.170972. Epub 2024 Feb 13.

Abstract

Assessment and proper management of sites contaminated with heavy metals require precise information on the spatial distribution of these metals. This study aimed to predict and map the distribution of Cd, Cu, Ni, Pb, and Zn across the conterminous USA using point observations, environmental variables, and Histogram-based Gradient Boosting (HGB) modeling. Over 9180 surficial soil observations from the Soil Geochemistry Spatial Database (SGSD) (n = 1150), the Geochemical and Mineralogical Survey of Soils (GMSS) (n = 4857), and the Holmgren Dataset (HD) (n = 3400), and 28 covariates (100 m × 100 m grid) representing climate, topography, vegetation, soils, and anthropic activity were compiled. Model performance was evaluated on 20 % of the data not used in calibration using the coefficient of determination (R2), concordance correlation coefficient (ρc), and root mean square error (RMSE) indices. Uncertainty of predictions was calculated as the difference between the estimated 95 and 5 % quantiles provided by HGB. The model explained up to 50 % of the variance in the data with RMSE ranging between 0.16 (mg kg-1) for Cu and 23.4 (mg kg-1) for Zn, respectively. Likewise, ρc ranged between 0.55 (Cu) and 0.68 (Zn), respectively, and Zn had the highest R2 (0.50) among all predictions. We observed high Pb concentrations near urban areas. Peak concentrations of all studied metals were found in the Lower Mississippi River Valley. Cu, Ni, and Zn concentrations were higher on the West Coast; Cd concentrations were higher in the central USA. Clay, pH, potential evapotranspiration, temperature, and precipitation were among the model's top five important covariates for spatial predictions of heavy metals. The combined use of point observations and environmental covariates coupled with machine learning provided a reliable prediction of heavy metals distribution in the soils of the conterminous USA. The updated maps could support environmental assessments, monitoring, and decision-making with this methodology applicable to other soil databases, worldwide.

Keywords: digital soil mapping; machine learning; metal pollution; prediction uncertainty; soil chemistry; soil contamination.