A downsampling method enables robust clustering and integration of single-cell transcriptome data

J Biomed Inform. 2022 Jun:130:104093. doi: 10.1016/j.jbi.2022.104093. Epub 2022 May 9.

Abstract

The random noises, sampling biases, and batch effects often confound true biological variations in single-cell RNA-sequencing (scRNA-seq) data. Adjusting such biases is key to the robust discoveries in downstream analyses, such as cell clustering, gene selection and data integration. Here we propose a model-based downsampling algorithm based on minimal unbiased representative points (MURPXMBD). MURPXMBD is designed to retrieve a set of representative points by reducing gene-wise random independent errors, while retaining the covariance structure of biological origin hence provide an unbiased representation of the cell population. Subsequent validation using benchmark datasets shows that MURPXMBD can improve the quality and accuracy of clustering algorithms, and thus facilitate the discovery of new cell types. Besides, MURPXMBD also improves the performance of dataset integration algorithms. In summary, MURPXMBD serves as a useful noise-reduction method for single-cell sequencing analysis in biomedical studies.

Keywords: Clustering; Data integration; Downsampling; Noise-reduction; scRNA-seq.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Gene Expression Profiling / methods
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis* / methods
  • Transcriptome*