A non-zero variance of Tajima's estimator for two sequences even for infinitely many unlinked loci

Theor Popul Biol. 2018 Jul:122:22-29. doi: 10.1016/j.tpb.2017.03.002. Epub 2017 Mar 21.

Abstract

The population-scaled mutation rate, θ, is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences and n unlinked loci, the variance of Tajima's estimator (θˆ), which is the average number of pairwise differences, does not vanish even as n→∞. The non-zero variance of θˆ results from a (weak) correlation between coalescence times even at unlinked loci, which, in turn, is due to the underlying fixed pedigree shared by gene genealogies at all loci. We derive the correlation coefficient under a diploid, discrete-time, Wright-Fisher model, and we also derive a simple, closed-form lower bound. We also obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogies. While the effect we describe is small (Varθˆ∕θ2≈ONe-1), it is important to recognize this feature of statistical population genetics, which runs counter to commonly held notions about unlinked loci.

Keywords: Coalescent theory; Effective population size; Genealogies; Heterozygosity; Pedigrees; Recombination.

MeSH terms

  • Computer Simulation
  • Demography
  • Female
  • Genealogy and Heraldry
  • Genetic Loci*
  • Genetic Variation
  • Genetics, Population / methods*
  • Heterozygote
  • Humans
  • Male
  • Models, Genetic*
  • Mutation Rate
  • Pedigree*
  • Population Density
  • Sequence Analysis