Adaptive Preferential Sampling in Phylodynamics With an Application to SARS-CoV-2

J Comput Graph Stat. 2022;31(2):541-552. doi: 10.1080/10618600.2021.1987256. Epub 2021 Nov 29.

Abstract

Longitudinal molecular data of rapidly evolving viruses and pathogens provide information about disease spread and complement traditional surveillance approaches based on case count data. The coalescent is used to model the genealogy that represents the sample ancestral relationships. The basic assumption is that coalescent events occur at a rate inversely proportional to the effective population size N e (t), a time-varying measure of genetic diversity. When the sampling process (collection of samples over time) depends on N e (t), the coalescent and the sampling processes can be jointly modeled to improve estimation of N e (t). Failing to do so can lead to bias due to model misspecification. However, the way that the sampling process depends on the effective population size may vary over time. We introduce an approach where the sampling process is modeled as an inhomogeneous Poisson process with rate equal to the product of N e (t) and a time-varying coefficient, making minimal assumptions on their functional shapes via Markov random field priors. We provide efficient algorithms for inference, show the model performance vis-a-vis alternative methods in a simulation study, and apply our model to SARS-CoV-2 sequences from Los Angeles and Santa Clara counties. The methodology is implemented and available in the R package adapref. Supplementary files for this article are available online.

Keywords: Coalescent process; Markov random fields; Poisson processes; Population size.