A population genetic hidden Markov model for detecting genomic regions under selection

Mol Biol Evol. 2010 Jul;27(7):1673-85. doi: 10.1093/molbev/msq053. Epub 2010 Feb 25.

Abstract

Recently, hidden Markov models have been applied to numerous problems in genomics. Here, we introduce an explicit population genetics hidden Markov model (popGenHMM) that uses single nucleotide polymorphism (SNP) frequency data to identify genomic regions that have experienced recent selection. Our popGenHMM assumes that SNP frequencies are emitted independently following diffusion approximation expectations but that neighboring SNP frequencies are partially correlated by selective state. We give results from the training and application of our popGenHMM to a set of early release data from the Drosophila Population Genomics Project (dpgp.org) that consists of approximately 7.8 Mb of resequencing from 32 North American Drosophila melanogaster lines. These results demonstrate the potential utility of our model, making predictions based on the site frequency spectrum (SFS) for regions of the genome that represent selected elements.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Drosophila melanogaster / genetics*
  • Markov Chains*
  • Metagenomics*
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics*
  • Selection, Genetic*