Following emerging, re-emerging, and endemic pathogen outbreaks, the rush to publish and the risk of data misrepresentation, misinterpretation, and even misinformation puts an even greater onus on methodological rigor, which includes revisiting initial assumptions as new evidence becomes available. This study sought to understand how and when early evidence emerges and evolves when addressing different types of recurring pathogen-related questions. By applying claim-matching by means of deep learning Natural Language Processing (NLP) of coronavirus disease 2019 (COVID-19) scientific literature against a set of expert-curated evidence, patterns in timing across different COVID-19 questions-and-answers were identified, to build a framework for characterizing uncertainty in emerging infectious disease (EID) research over time. COVID-19 was chosen as a use case for this framework given the large and accessible datasets curated for scientists during the beginning of the pandemic. Timing patterns in reliably answering broad COVID-19 questions often do not align with general publication patterns, but early expert-curated evidence was generally stable. Because instability in answers often occurred within the first 2 to 6 mo for specific COVID-19 topics, public health officials could apply more conservative policies at the start of future pandemics, to be revised as evidence stabilizes.
Keywords: SARS-CoV-2; natural language processing; pandemics; public health; uncertainty.