Two fitness inference schemes compared using allele frequencies from 1068 391 sequences sampled in the UK during the COVID-19 pandemic

Hong-Li Zeng; Cheng-Long Yang; Bo Jing; John Barton; Erik Aurell

doi:10.1088/1478-3975/ad9213

Two fitness inference schemes compared using allele frequencies from 1068 391 sequences sampled in the UK during the COVID-19 pandemic

Phys Biol. 2024 Nov 21;22(1). doi: 10.1088/1478-3975/ad9213.

Authors

Hong-Li Zeng¹, Cheng-Long Yang¹, Bo Jing¹, John Barton², Erik Aurell³

Affiliations

¹ School of Science, Nanjing University of Posts and Telecommunications, Key Laboratory of Radio and Micro-Nano Electronics of Jiangsu Province, Nanjing 210023, People's Republic of China.
² Department of Computational & Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, United States of America.
³ Department of Computational Science and Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden.

PMID: 39536448
DOI: 10.1088/1478-3975/ad9213

Abstract

Throughout the course of the SARS-CoV-2 pandemic, genetic variation has contributed to the spread and persistence of the virus. For example, various mutations have allowed SARS-CoV-2 to escape antibody neutralization or to bind more strongly to the receptors that it uses to enter human cells. Here, we compared two methods that estimate the fitness effects of viral mutations using the abundant sequence data gathered over the course of the pandemic. Both approaches are grounded in population genetics theory but with different assumptions. One approach, tQLE, features an epistatic fitness landscape and assumes that alleles are nearly in linkage equilibrium. Another approach, MPL, assumes a simple, additive fitness landscape, but allows for any level of correlation between alleles. We characterized differences in the distributions of fitness values inferred by each approach and in the ranks of fitness values that they assign to sequences across time. We find that in a large fraction of weeks the two methods are in good agreement as to their top-ranked sequences, i.e. as to which sequences observed that week are most fit. We also find that agreement between the ranking of sequences varies with genetic unimodality in the population in a given week.

Keywords: SARS-CoV-2; allele frequency time series; fitness inference; marginal path likelihood (MPL); transient quasi-linkage equilibrium (tQLE).

Creative Commons Attribution license.

Publication types

Comparative Study

MeSH terms

COVID-19* / epidemiology
COVID-19* / genetics
COVID-19* / virology
Gene Frequency*
Genetic Fitness
Humans
Models, Genetic
Mutation
Pandemics
SARS-CoV-2* / genetics
United Kingdom / epidemiology

Grants and funding

R35 GM138233/GM/NIGMS NIH HHS/United States