Background: Long-term surface NO2 data are essential for retrospective policy evaluation and chronic human exposure assessment. In the absence of NO2 observations for Mainland China before 2013, training a model with 2013-2018 data to make predictions for 2005-2012 (back-extrapolation) could cause substantial estimation bias due to concept drift.
Objective: This study aims to correct the estimation bias in order to reconstruct the spatiotemporal distribution of daily surface NO2 levels across China during 2005-2018.
Methods: On the basis of ground- and satellite-based data, we proposed the robust back-extrapolation with a random forest (RBE-RF) to simulate the surface NO2 through intermediate modeling of the scaling factors. For comparison purposes, we also employed a random forest (Base-RF), as a representative of the commonly used approach, to directly model the surface NO2 levels.
Results: The validation against Taiwan's NO2 observations during 2005-2012 showed that RBE-RF adequately corrected the substantial underestimation by Base-RF. The RMSE decreased from 10.1 to 8.2 µg/m3, 7.1 to 4.3 µg/m3, and 6.1 to 2.9 µg/m3 in predicting daily, monthly, and annual levels, respectively. For North China with the most severe pollution, the population-weighted NO2 ([NO2]pw) during 2005-2012 was estimated as 40.2 and 50.9 µg/m3 by Base-RF and RBE-RF, respectively, i.e., 21.0% difference. While both models predicted that the national annual [NO2]pw increased during 2005-2011 and then decreased, the interannual trends were underestimated by >50.2% by Base-RF relative to RBE-RF. During 2005-2018, the nationwide population that lived in the areas with NO2 > 40 µg/m3 were estimated as 259 and 460 million by Base-RF and RBE-RF, respectively.
Conclusion: With RBE-RF, we corrected the estimation bias in back-extrapolation and obtained a full-coverage dataset of daily surface NO2 across China during 2005-2018, which is valuable for environmental management and epidemiological research.
Keywords: Back extrapolation; Concept drift; Exposure assessment; Long term; Machine learning; Nitrogen dioxide.
Copyright © 2021 The Author(s). Published by Elsevier Ltd.. All rights reserved.