Signal and noise in metabarcoding data

PLoS One. 2023 May 11;18(5):e0285674. doi: 10.1371/journal.pone.0285674. eCollection 2023.

Abstract

Metabarcoding is a powerful molecular tool for simultaneously surveying hundreds to thousands of species from a single sample, underpinning microbiome and environmental DNA (eDNA) methods. Deriving quantitative estimates of underlying biological communities from metabarcoding is critical for enhancing the utility of such approaches for health and conservation. Recent work has demonstrated that correcting for amplification biases in genetic metabarcoding data can yield quantitative estimates of template DNA concentrations. However, a major source of uncertainty in metabarcoding data stems from non-detections across technical PCR replicates where one replicate fails to detect a species observed in other replicates. Such non-detections are a special case of variability among technical replicates in metabarcoding data. While many sampling and amplification processes underlie observed variation in metabarcoding data, understanding the causes of non-detections is an important step in distinguishing signal from noise in metabarcoding studies. Here, we use both simulated and empirical data to 1) suggest how non-detections may arise in metabarcoding data, 2) outline steps to recognize uninformative data in practice, and 3) identify the conditions under which amplicon sequence data can reliably detect underlying biological signals. We show with both simulations and empirical data that, for a given species, the rate of non-detections among technical replicates is a function of both the template DNA concentration and species-specific amplification efficiency. Consequently, we conclude metabarcoding datasets are strongly affected by (1) deterministic amplification biases during PCR and (2) stochastic sampling of amplicons during sequencing-both of which we can model-but also by (3) stochastic sampling of rare molecules prior to PCR, which remains a frontier for quantitative metabarcoding. Our results highlight the importance of estimating species-specific amplification efficiencies and critically evaluating patterns of non-detection in metabarcoding datasets to better distinguish environmental signal from the noise inherent in molecular detections of rare targets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biodiversity
  • DNA / genetics
  • DNA Barcoding, Taxonomic* / methods
  • DNA, Environmental*
  • Polymerase Chain Reaction / methods
  • Uncertainty

Substances

  • DNA
  • DNA, Environmental

Grants and funding

ZG was supported by the Joint Institute for the Study of the Atmosphere and Ocean (JISAO) under NOAA Cooperative Agreement NA15OAR4320063. AJJ was supported by NOAA and University of Washington. RPK was supported by the David and Lucile Packard Foundation. EAA and ED were supported by OceanKind. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.