Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq

Nucleic Acids Res. 2021 Aug 20;49(14):7986-7994. doi: 10.1093/nar/gkab621.

Abstract

Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost 'capture' method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cell Line
  • Cell Line, Tumor
  • Chromatin Immunoprecipitation Sequencing / methods*
  • Genome / genetics*
  • Genome, Human / genetics
  • Humans
  • INDEL Mutation*
  • Jurkat Cells
  • Mice
  • Polymorphism, Single Nucleotide*
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Reproducibility of Results
  • Single-Cell Analysis / methods*