Quantifying population genetic differentiation from next-generation sequencing data

Matteo Fumagalli; Filipe G Vieira; Thorfinn Sand Korneliussen; Tyler Linderoth; Emilia Huerta-Sánchez; Anders Albrechtsen; Rasmus Nielsen

doi:10.1534/genetics.113.154740

Quantifying population genetic differentiation from next-generation sequencing data

Genetics. 2013 Nov;195(3):979-92. doi: 10.1534/genetics.113.154740. Epub 2013 Aug 26.

Authors

Matteo Fumagalli¹, Filipe G Vieira, Thorfinn Sand Korneliussen, Tyler Linderoth, Emilia Huerta-Sánchez, Anders Albrechtsen, Rasmus Nielsen

Affiliation

¹ Department of Integrative Biology, University of California, Berkeley, California 94720.

Abstract

Over the past few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data. In particular, the use of naïve methods to identify polymorphic sites and infer genotypes can inflate downstream analyses. Recently, explicit modeling of genotype probability distributions has been proposed as a method for taking genotype call uncertainty into account. Based on this idea, we propose a novel method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy for investigating population structure via principal components analysis. Through extensive simulations, we compare the new method herein proposed to approaches based on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled individuals, suggesting that employing this new method is useful for investigating the genetic relationships of populations sampled at low coverage.

Keywords: FST; next-generation sequencing; principal components analysis.

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Animals
Bombyx / genetics
Computational Biology
Computer Simulation
Data Interpretation, Statistical
Genetic Drift
Genetic Variation
Genetics, Population / statistics & numerical data*
Genotype
High-Throughput Nucleotide Sequencing / statistics & numerical data*
Likelihood Functions
Models, Genetic
Mutation
Principal Component Analysis
Selection, Genetic

Abstract

Publication types

MeSH terms

Grants and funding