An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

PLoS Comput Biol. 2021 Sep 13;17(9):e1008949. doi: 10.1371/journal.pcbi.1008949. eCollection 2021 Sep.

Abstract

A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • DNA Barcoding, Taxonomic*
  • DNA, Mitochondrial / genetics
  • Humans
  • Markov Chains
  • Monte Carlo Method
  • Phylogeny*
  • Polymorphism, Single Nucleotide

Substances

  • DNA, Mitochondrial

Grants and funding

This project was funded by an Australian Research Council Discovery Project Grant DP160103474 to AR and National Natural Science Foundation of China No. 31501879 to TL. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.