Calling differentially methylated regions from whole genome bisulphite sequencing with DMRcate

Nucleic Acids Res. 2021 Nov 8;49(19):e109. doi: 10.1093/nar/gkab637.

Abstract

Whole genome bisulphite sequencing (WGBS) permits the genome-wide study of single molecule methylation patterns. One of the key goals of mammalian cell-type identity studies, in both normal differentiation and disease, is to locate differential methylation patterns across the genome. We discuss the most desirable characteristics for DML (differentially methylated locus) and DMR (differentially methylated region) detection tools in a genome-wide context and choose a set of statistical methods that fully or partially satisfy these considerations to compare for benchmarking. Our data simulation strategy is both biologically informed-employing distribution parameters derived from large-scale consortium datasets-and thorough. We report DML detection ability with respect to coverage, group methylation difference, sample size, variability and covariate size, both marginally and jointly, and exhaustively with respect to parameter combination. We also benchmark these methods on FDR control and computational time. We use this result to backend and introduce an expanded version of DMRcate: an existing DMR detection tool for microarray data that we have extended to now call DMRs from WGBS data. We compare DMRcate to a set of alternative DMR callers using a similarly realistic simulation strategy. We find DMRcate and RADmeth are the best predictors of DMRs, and conclusively find DMRcate the fastest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • Computer Simulation
  • CpG Islands
  • DNA / genetics
  • DNA / metabolism*
  • DNA Methylation*
  • Epigenesis, Genetic*
  • Genome, Human*
  • Genomics / methods
  • Humans
  • Sample Size
  • Sequence Analysis, DNA / statistics & numerical data*
  • Sulfites / chemistry
  • Whole Genome Sequencing

Substances

  • Sulfites
  • DNA
  • hydrogen sulfite