Ancestry inference in complex admixtures via variable-length Markov chain linkage models

J Comput Biol. 2013 Mar;20(3):199-211. doi: 10.1089/cmb.2012.0088. Epub 2013 Feb 19.

Abstract

Inferring the ancestral origin of chromosomal segments in admixed individuals is key for genetic applications, ranging from analyzing population demographics and history, to mapping disease genes. Previous methods addressed ancestry inference by using either weak models of linkage disequilibrium, or large models that make explicit use of ancestral haplotypes. In this paper we introduce ALLOY, an efficient method that incorporates generalized, but highly expressive, linkage disequilibrium models. ALLOY applies a factorial hidden Markov model to capture the parallel process producing the maternal and paternal admixed haplotypes, and models the background linkage disequilibrium in the ancestral populations via an inhomogeneous variable-length Markov chain. We test ALLOY in a broad range of scenarios ranging from recent to ancient admixtures with up to four ancestral populations. We show that ALLOY outperforms the previous state of the art, and is robust to uncertainties in model parameters.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation
  • Gene Pool*
  • Genealogy and Heraldry*
  • Genetic Linkage*
  • Haplotypes / genetics
  • Humans
  • Linkage Disequilibrium / genetics
  • Markov Chains*
  • Models, Genetic*