CGmapTools improves the precision of heterozygous SNV calls and supports allele-specific methylation detection and visualization in bisulfite-sequencing data

Bioinformatics. 2018 Feb 1;34(3):381-387. doi: 10.1093/bioinformatics/btx595.

Abstract

Motivation: DNA methylation is important for gene silencing and imprinting in both plants and animals. Recent advances in bisulfite sequencing allow detection of single nucleotide variations (SNVs) achieving high sensitivity, but accurately identifying heterozygous SNVs from partially C-to-T converted sequences remains challenging.

Results: We designed two methods, BayesWC and BinomWC, that substantially improved the precision of heterozygous SNV calls from ∼80% to 99% while retaining comparable recalls. With these SNV calls, we provided functions for allele-specific DNA methylation (ASM) analysis and visualizing the methylation status on reads. Applying ASM analysis to a previous dataset, we found that an average of 1.5% of investigated regions showed allelic methylation, which were significantly enriched in transposon elements and likely to be shared by the same cell-type. A dynamic fragment strategy was utilized for DMR analysis in low-coverage data and was able to find differentially methylated regions (DMRs) related to key genes involved in tumorigenesis using a public cancer dataset. Finally, we integrated 40 applications into the software package CGmapTools to analyze DNA methylomes. This package uses CGmap as the format interface, and designs binary formats to reduce the file size and support fast data retrieval, and can be applied for context-wise, gene-wise, bin-wise, region-wise and sample-wise analyses and visualizations.

Availability and implementation: The CGmapTools software is freely available at https://cgmaptools.github.io/.

Contact: [email protected].

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles*
  • Animals
  • DNA Methylation*
  • Eukaryota / genetics
  • Genomics / methods
  • Heterozygote*
  • Humans
  • Sequence Analysis, DNA / methods*
  • Software*
  • Sulfites

Substances

  • Sulfites
  • hydrogen sulfite