Detecting SNPs and estimating allele frequencies in clonal bacterial populations by sequencing pooled DNA

Bioinformatics. 2009 Aug 15;25(16):2074-5. doi: 10.1093/bioinformatics/btp344. Epub 2009 Jun 3.

Abstract

Summary: Here, we present a method for estimating the frequencies of SNP alleles present within pooled samples of DNA using high-throughput short-read sequencing. The method was tested on real data from six strains of the highly monomorphic pathogen Salmonella Paratyphi A, sequenced individually and in a pool. A variety of read mapping and quality-weighting procedures were tested to determine the optimal parameters, which afforded > or =80% sensitivity of SNP detection and strong correlation with true SNP frequency at poolwide read depth of 40x, declining only slightly at read depths 20-40x.

Availability: The method was implemented in Perl and relies on the opensource software Maq for read mapping and SNP calling. The Perl script is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/pools/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • DNA / chemistry*
  • Databases, Genetic
  • Gene Frequency / genetics*
  • Polymorphism, Single Nucleotide*
  • Salmonella paratyphi A / genetics*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA