Integrating Multiple Data Sources and Learning Models to Predict Infectious Diseases in China

AMIA Jt Summits Transl Sci Proc. 2019 May 6:2019:680-685. eCollection 2019.

Abstract

The outbreaks of infectious diseases do not only endanger people's lives and property, but can also result in negative social impact and economic loss. Therefore, establishing early warning technologies for infectious diseases is of great value. This paper was built on the historical morbidity and mortality incidence data of infectious diseases, including typhoid fever, Hemorrhagic Fever with Renal Syndrome (HFRS), mumps, scarlatina, malaria, dysentery, pertussis, conjunctivitis, pulmonary tuberculosis, diarrhea from 2012 to 2016 in China. We also integrated search engine query data and seasonal information into the prediction models. Multiple models for prediction, including linear model, time series analysis model, boosting tree model and deep learning model (recurrent neural network, RNN) were constructed in order to predict the morbidity incidence of 10 infectious diseases. The RNN model has better predictive capability for these diseases. The improvement of techniques for infectious disease prediction can facilitate constructive and positive change towards disease prevention.