Inferring Local Genealogies on Closely Related Genomes

Comp Genom. 2017:10562:213-231. doi: 10.1007/978-3-319-67979-2_12. Epub 2017 Sep 15.

Abstract

The relationship between the evolution of a set of genomes and of individual loci therein could be very complex. For example, in eukaryotic species, meiotic recombination combined with effects of random genetic drift result in loci whose genealogies differ from each other as well as from the phylogeny of the species or populations-a phenomenon known as incomplete lineage sorting, or ILS. The most common practice for inferring local genealogies of individual loci is to slide a fixed-width window across an alignment of the genomes, and infer a phylogenetic tree from the sequence alignment of each window. However, at the evolutionary scale where ILS is extensive, it is often the case that the phylogenetic signal within each window is too low to infer an accurate local genealogy. In this paper, we propose a hidden Markov model (HMM) based method for inferring local genealogies conditional on a known species tree. The method borrows ideas from the work on coalescent HMMs, yet approximates the model parameterization to focus on computationally efficient inference of local genealogies, rather than on obtaining detailed model parameters. We also show how the method is extended to cases that involve hybridization in addition to recombination and ILS. We demonstrate the performance of our method on synthetic data and one empirical data set, and compare it to the sliding-window approach that is, arguably, the most commonly used technique for inferring local genealogies.