CaMelia: imputation in single-cell methylomes based on local similarities between cells

Bioinformatics. 2021 Jul 27;37(13):1814-1820. doi: 10.1093/bioinformatics/btab029.

Abstract

Motivation: Single-cell DNA methylation sequencing detects methylation levels with single-cell resolution, while this technology is upgrading our understanding of the regulation of gene expression through epigenetic modifications. Meanwhile, almost all current technologies suffer from the inherent problem of detecting low coverage of the number of CpGs. Therefore, addressing the inherent sparsity of raw data is essential for quantitative analysis of the whole genome.

Results: Here, we reported CaMelia, a CatBoost gradient boosting method for predicting the missing methylation states based on the locally paired similarity of intercellular methylation patterns. On real single-cell methylation datasets, CaMelia yielded significant imputation performance gains over previous methods. Furthermore, applying the imputed data to the downstream analysis of cell-type identification, we found that CaMelia helped to discover more intercellular differentially methylated loci that were masked by the sparsity in raw data, and the clustering results demonstrated that CaMelia could preserve cell-cell relationships and improve the identification of cell types and cell subpopulations.

Availability and implementation: Python code is available at https://github.com/JxTang-bioinformatics/CaMelia.

Supplementary information: Supplementary data are available at Bioinformatics online.