The site-frequency spectrum of linked sites

Bull Math Biol. 2011 Mar;73(3):459-94. doi: 10.1007/s11538-010-9534-3. Epub 2010 Mar 27.

Abstract

The site-frequency spectrum, representing the distribution of allele frequencies at a set of polymorphic sites, is a commonly used summary statistic in population genetics. Explicit forms of the spectrum are known for both models with and without selection if independence among sites is assumed. The availability of these explicit forms has allowed for maximum likelihood estimation of selection, developed first in the Poisson random field model of Sawyer and Hartl, which is now the primary method for estimating selection directly from DNA sequence data. The independence assumption, which amounts to assume free recombination between sites, is, however, a limiting case for many population genetics models. Here, we extend the site-frequency spectrum theory to consider the case where the sites are completely linked. We use diffusion approximation to calculate the joint distribution of the allele frequencies of linked sites for models without selection and for models with equal coefficient selection. The joint distribution is derived by first constructing Green's functions corresponding to multiallele diffusion equations. We show that the site-frequency spectrum is highly correlated between frequencies that are complementary (i.e., sum to 1), and the correlation is significantly elevated by positive selection. The results presented here can be used to extend the Poisson random field to allow for estimating selection for correlated sites. More generally, the Green's function construction should be able to aid in studying the genetic drift of multiple alleles in other cases.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computer Simulation
  • Gene Frequency
  • Genetic Drift
  • Genetics, Population / methods*
  • Models, Genetic*