The aggregate site frequency spectrum for comparative population genomic inference

Mol Ecol. 2015 Dec;24(24):6223-40. doi: 10.1111/mec.13447. Epub 2015 Dec 12.

Abstract

Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavour that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of nonmodel species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) data sets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multispecies demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a data set consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent postglacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multilevel statistical frameworks to test models involving assemblages and/or communities, and as large-scale SNP data from nonmodel species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference.

Keywords: allele/site frequency spectrum; approximate Bayesian computation; co-expansion; comparative phylogeography; demographic inference; hierarchical modelling; population genomics.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Bayes Theorem
  • Computer Simulation
  • Gene Frequency
  • Genetics, Population / methods*
  • Metagenomics / methods*
  • Models, Genetic*
  • Phylogeography
  • Polymorphism, Single Nucleotide
  • Population Density
  • Smegmamorpha / genetics

Associated data

  • Dryad/10.5061/dryad.B6VH6
  • GENBANK/SRX015871
  • GENBANK/SRX015877