Motivation: A main challenge in molecular evolution is to find computationally efficient mutation models with flexible assumptions that properly reflect genetic variation. The infinite sites model assumes that each mutation event occurs at a site never previously mutant, i.e. it does not allow recurrent mutations. This is reasonable for low mutation rates and makes statistical inference much more tractable. However, recurrent mutations are common enough to be observable from genetic variation data, even in species with low per-site mutation rates such as humans. The finite sites model on the other hand allows for recurrent mutations but is computationally unfeasible to work with in most cases. In this work, we bridge these two approaches by developing a novel molecular evolution model, the almost infinite sites model, that both admits recurrent mutations and is tractable. We provide a recursive characterization of the likelihood of our proposed model under complete linkage and outline a parsimonious approximation scheme for computing it.
Results: We show the usefulness of our model in simulated and human mitochondrial data. Our results show that the AISM, in combination with a constraint on the total number of mutation events, can recover accurate approximations to the maximum likelihood estimator of the mutation rate.
Availability and implementation: An implementation of our model is freely available along with code for reproducing our computational experiments at https://github.com/Cronjaeger/almost-infinite-sites-recursions.
Keywords: Coalescent; Finite sites; Infinite sites; Molecular evolution; Parsimony; Sampling distribution.
Copyright © 2024. Published by Elsevier Inc.