An assessment of compositional methods for the analysis of DNA methylation-based deconvolution estimates

Epigenomics. 2024;16(15-16):1067-1080. doi: 10.1080/17501911.2024.2379242. Epub 2024 Aug 2.

Abstract

DNA methylation (DNAm)-based deconvolution estimates contain relative data, forming a composition, that standard methods (testing directly on cell proportions) are ill-suited to handle. In this study we examined the performance of an alternative method, analysis of compositions of microbiomes (ANCOM), for the analysis of DNAm-based deconvolution estimates. We performed two different simulation studies comparing ANCOM to a standard approach (two sample t-test performed directly on cell proportions) and analyzed a real-world data from the Women's Health Initiative to evaluate the applicability of ANCOM to DNAm-based deconvolution estimates. Our findings indicate that ANCOM can effectively account for the compositional nature of DNAm-based deconvolution estimates. ANCOM adequately controls the false discovery rate while maintaining statistical power comparable to that of standard methods.

Keywords: ANCOM; Women’s Health Initiative; cell proportion data; compositional data; deconvolution.

Plain language summary

DNA methylation (DNAm)-based deconvolution provides highly accurate estimates of the proportion of each cell type in a mixed-cell type biological sample (e.g., whole-blood). These estimates can be used for examining the association between cell type proportions and biological or clinical end points; for example, comparing the estimated neutrophil proportion in whole blood between smokers and non-smokers. Cell proportion data has unique features which present challenges for traditional and widely used statistical methods. In response to this issue, our work presents two simulation studies and a real-world analysis that benchmark the performance of current standard statistical methods against an alternative method called analysis composition of microbes (ANCOM), which was originally developed for the analysis of microbiome data. In our real-world analysis we used DNAm data collected from Women’s Health Initiative Long Life Study I and compared the results of each method against a gold-standard that is typically not available for these analyses. In each of our simulation studies, ANCOM was able to detect true differences in cell proportions between the groups being compared but had a much lower rate of false discovery compared with the standard statistical methods. Our real-world analysis demonstrated similar findings. Overall, our study highlights the potential of ANCOM as a powerful and robust method for analyzing DNAm-derived deconvolution estimates when the interest is comparisons of cell type proportions and biological or clinical end points. ANCOM’s ability to minimize false discovery while maintaining robust statistical power positions it as a valuable addition to the epigenomic analysis toolkit.

MeSH terms

  • Computer Simulation
  • DNA Methylation*
  • Female
  • Humans
  • Microbiota / genetics