Petrochemical industry is a key industry of soil pollution, which presents great effects on human health and the ecological environment. It is of great significance to achieve rapid, economic and efficient health risk identification for petrochemical industry in China. In this work, an efficient method was developed based on extreme gradient boosting (XGBoost) algorithm for human health risk identification, which is different from the traditional health risk assessment with complicated procedures. In this methodology, an index system of 13 indicators was established from the perspective of "sources - pathways - receptors" for risk identification. The 10-fold cross validation was used to assess the generalization performance, and the accuracy, precision and recall were employed to evaluate the performance of the algorithms. Wilcoxon signed-rank test was conducted to analyze the differences between XGBoost and other models for statistical support. The results showed that XGBoost significantly presented a better performance for health risk identification over multilayer perceptron neural network with error backpropagation training (BPNN), support vector machine (SVM), gradient boosting decision tree (GBDT) and light gradient boosting machine (LightGBM), with an accuracy of 0.783. The most important features contributing to the risk identification were determined with the sequence of site location (in the industrial zone or not), site planning and production period. Great attention should be given to the petrochemical sites that are not located in the industrial zone with long production period and sensitive receptors in the health risk identification. This method has important reference significance for relevant departments to carry out soil contamination screening and health risk assessment of petrochemical sites.
Keywords: Extreme Gradient Boosting; Health risk identification; Petrochemical sites.
Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.