Evaluation of agreement between common clustering strategies for DNA methylation-based subtyping of breast tumours

Epigenomics. 2024 Dec 23:1-10. doi: 10.1080/17501911.2024.2441653. Online ahead of print.

Abstract

Aims: Clustering algorithms have been widely applied to tumor DNA methylation datasets to define methylation-based cancer subtypes. This study aimed to evaluate the agreement between subtypes obtained from common clustering strategies.

Materials & methods: We used tumor DNA methylation data from 409 women with breast cancer from the Melbourne Collaborative Cohort Study (MCCS) and 781 breast tumors from The Cancer Genome Atlas (TCGA). Agreement was assessed using the adjusted Rand index for various combinations of number of CpGs, number of clusters and clustering algorithms (hierarchical, K-means, partitioning around medoids, and recursively partitioned mixture models).

Results: Inconsistent agreement patterns were observed for between-algorithm and within-algorithm comparisons, with generally poor to moderate agreement (ARI <0.7). Results were qualitatively similar in the MCCS and TCGA, showing better agreement for moderate number of CpGs and fewer clusters (K = 2). Restricting the analysis to CpGs that were differentially-methylated between tumor and normal tissue did not result in higher agreement.

Conclusion: Our study highlights that common clustering strategies involving an arbitrary choice of algorithm, number of clusters and number of methylation sites are likely to identify different DNA methylation-based breast tumor subtypes.

Keywords: Breast tumor; DNA methylation; adjusted rand index; agreement; clustering algorithm.