Soil nitrogen content and pH value are two pivotal factors that critically determine soil fertility and plant growth. As key indicators of soil health, they each play distinct yet complementary roles in the soil ecosystem. Nitrogen is one of the essential nutrients for plant growth, while soil pH directly influences the activity of soil microorganisms. These microbes are essential for breaking down minerals and organic materials, which in turn affects the availability and conversion of key nutrients like nitrogen and phosphorus. A comprehensive understanding of the distribution of total nitrogen content and pH value is crucial for ensuring the sustainability of agricultural production and maintaining soil and ecosystem health. Existing models for estimating soil property based on near-infrared (NIR) spectral data often overlook the spatial non-stationarity of the relationship between soil spectra and composition content. Therefore, we proposed a new model for estimating soil total nitrogen content and pH value, which combined geographically neural network weighted regression (GNNWR) with extreme gradient boosting (XGBoost), utilizing neural networks to improve the accuracy of predicting total nitrogen content and pH value, efficiently captured the spatial heterogeneity between spectral reflectance and soil total nitrogen content and pH value in different regions. Using the soil nutrient and visible near-infrared spectral samples collected by Eurostat in 2009 for the land use and coverage area frame survey of the 23 members of the European Union, the Geographically Neural Network Weighted-eXtreme Gradient Boosting (GNNW-XGBoost) model was used to estimate total nitrogen content and pH value. The spatial correlation between reflectance of spectral characteristic bands and soil total nitrogen content, pH value was trained in the model to verify its robustness and superiority, and the experimental process was improved by 10-fold cross-validation. In terms of model evaluation, compared to the standalone XGBoost and GNNWR models, the GNNW-XGBoost model demonstrated superior predictive accuracy. It achieved a highest coefficient of determination (R2) of 0.84 for total nitrogen and 0.80 for pH. Additionally, it reduced the root mean square error (RMSE) by 7.64 %, 7.61 % for total nitrogen, and 8.96 %, 4.69 % for pH, respectively. This study not only provides a new method for accurate prediction of soil total nitrogen content and pH value, but also has significant reference value for other estimation issues involving geographic data, which can help to improve the accuracy of environmental monitoring, optimize resource management strategies, and promote the development of sustainable agriculture.
Keywords: GNNW-XGBoost model; Soil pH and total N; Spatial nonstationarity; Vis-NIR data.
Copyright © 2025 Elsevier B.V. All rights reserved.