Objective: To predict the trends for fine-scale spread of Oncomelania hupensis based on supervised machine learning models in Shanghai Municipality, so as to provide insights into precision O. hupensis snail control.
Methods: Based on 2016 O. hupensis snail survey data in Shanghai Municipality and climatic, geographical, vegetation and socioeconomic data relating to O. hupensis snail distribution, seven supervised machine learning models were created to predict the risk of snail spread in Shanghai, including decision tree, random forest, generalized boosted model, support vector machine, naive Bayes, k-nearest neighbor and C5.0. The performance of seven models for predicting snail spread was evaluated with the area under the receiver operating characteristic curve (AUC), F1-score and accuracy, and optimal models were selected to identify the environmental variables affecting snail spread and predict the areas at risk of snail spread in Shanghai Municipality.
Results: Seven supervised machine learning models were successfully created to predict the risk of snail spread in Shanghai Municipality, and random forest (AUC = 0.901, F1-score = 0.840, ACC = 0.797) and generalized boosted model (AUC= 0.889, F1-score = 0.869, ACC = 0.835) showed higher predictive performance than other models. Random forest analysis showed that the three most important climatic variables contributing to snail spread in Shanghai included aridity (11.87%), ≥ 0 °C annual accumulated temperature (10.19%), moisture index (10.18%) and average annual precipitation (9.86%), the two most important vegetation variables included the vegetation index of the first quarter (8.30%) and vegetation index of the second quarter (7.69%). Snails were more likely to spread at aridity of < 0.87, ≥ 0 °C annual accumulated temperature of 5 550 to 5 675 °C, moisture index of > 39% and average annual precipitation of > 1 180 mm, and with the vegetation index of the first quarter of > 0.4 and the vegetation index of the first quarter of > 0.6. According to the water resource developments and township administrative maps, the areas at risk of snail spread were mainly predicted in 10 townships/subdistricts, covering the Xipian, Dongpian and Tainan sections of southern Shanghai.
Conclusions: Supervised machine learning models are effective to predict the risk of fine-scale O. hupensis snail spread and identify the environmental determinants relating to snail spread. The areas at risk of O. hupensis snail spread are mainly located in southwestern Songjiang District, northwestern Jinshan District and southeastern Qingpu District of Shanghai Municipality.
[摘要] 目的 采用监督式机器学习模型预测上海市小尺度湖北钉螺扩散趋势, 为钉螺精准防控提供依据。方法 利 用 2016 年上海市钉螺调查资料和钉螺分布相关气候、地理、植被、经济社会等数据, 构建决策树、随机森林、广义推进模 型、支持向量机、朴素贝叶斯、k-近邻和 C5.0 等 7 种机器学习模型预测上海市钉螺扩散风险。采用受试者工作特征曲线 下面积 (area under the curve, AUC)、F1 值 (F1-scores) 和准确率 (accuracy, ACC) 等指标评价7种模型预测性能, 并选择最 优模型对上海市钉螺扩散环境因素和风险区进行预测。结果 成功建立了 7 种可用于预测上海市钉螺扩散风险的机器 学习模型, 其中随机森林模型 (AUC = 0.901, F1 = 0.840, ACC = 0.797) 和广义推进模型 (AUC = 0.889, F1 = 0.869, ACC = 0.835) 预测效果较好。随机森林模型显示, 对上海市钉螺扩散影响较大的气候变量主要包括干燥度 (11.87%)、≥ 0 °C 年 积温 (10.19%)、湿润指数 (10.18%) 和年均降雨量 (9.86%); 植被变量主要包括第一季度植被指数 (8.30%) 和第二季度植 被指数 (7.69%)。气候变量中, 干燥度< 0.87、≥ 0 °C 年积温在5 550 ~ 5 675 °C、湿润指数> 39%、年均降雨量> 1 180 mm, 易发生钉螺扩散; 植被因子中, 第一季度植被指数> 0.4、第二季度植被指数> 0.6, 易发生钉螺扩散。结合水利片区和乡 (镇) 行政地图, 上海市钉螺扩散风险区域主要分布在 10 个街道 (镇), 涉及浦南西片区、浦南东片区和太南片区等 3 个水 利片区。结论 监督式机器学习模型可用于预测小尺度范围钉螺扩散风险并可评估导致钉螺扩散的环境因素。上海市 钉螺扩散风险区主要分布在松江区西南部地区、金山区西北部地区和青浦区东南部地区。.
Keywords:
Machine learning model;