With the fast growing field of epigenetics comes the need to better understand the intricacies of DNA methylation data analysis. High-throughput profiling using techniques, such as Illumina's BeadArray assay, enable the quantitative assessment of methylation. Challenges arise from the fact that resulting methylation levels (so-called beta values) are proportions between 0 and 1, often from an asymmetric, bimodal distribution with peaks close to 0 and 1. Therefore, the majority of standard statistical approaches do not apply. The logit transformation into so-called M-values is a common approach to circumvent this problem and aims to allow the use of common statistical methods. However, it can be observed that the transformation from beta to M-values does not necessarily result in an approximately homoscedastic distribution. Often, bimodality, asymmetry and heteroscedasticity are conserved even after transformation. We give an overview and discussion of methods suggested in the recent years that attempt to address the characteristics of methylation data in univariate screening settings. In order to identify 'differential' methylation with respect to covariates of interest while adjusting for confounders, we compare parametric methods, such as linear and beta regression, and nonparametric methods, such as rank-based regression. Our goal is to sensitise researchers to the challenges and issues that arise from this type of data as well as to present possible solutions.
Keywords: DNA methylation; beta regression; logit transformation; rank-based regression; univariate screening.
Copyright © 2014 John Wiley & Sons, Ltd.