The impact of sampling bias on viral phylogeographic reconstruction

PLOS Glob Public Health. 2022 Sep 28;2(9):e0000577. doi: 10.1371/journal.pgph.0000577. eCollection 2022.

Abstract

Genomic epidemiology plays an ever-increasing role in our understanding of and response to the spread of infectious pathogens. Phylogeography, the reconstruction of the historical location and movement of pathogens from the evolutionary relationships among sampled pathogen sequences, can inform policy decisions related to viral movement among jurisdictions. However, phylogeographic reconstruction is impacted by the fact that the sampling and virus sequencing policies differ among jurisdictions, and these differences can cause bias in phylogeographic reconstructions. Here we assess the potential impacts of geographic-based sampling bias on estimated viral locations in the past, and on whether key viral movements can be detected. We quantify the effect of bias using simulated phylogenies with known geographic histories, and determine the impact of the biased sampling and of the underlying migration rate on the accuracy of estimated past viral locations. We find that overall, the accuracy of phylogeographic reconstruction is high, particularly when the migration rate is low. However, results depend on sampling, and sampling bias can have a large impact on the numbers and nature of estimated migration events. We apply these insights to the geographic spread of Ebolavirus in the 2014-2016 West Africa epidemic. This work highlights how sampling policy can both impact geographic inference and be optimized to best ensure the accuracy of specific features of geographic spread.

Grants and funding

C.C. is supported by the Government of Canada’s Canada 150 Research Chair program. A.M. is supported by a Natural Sciences and Engineering Research Council of Canada grant (RGPIN-2022-03113). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.