Supervised dimensionality reduction for exploration of single-cell data by HSS-LDA

Patterns (N Y). 2022 Jun 24;3(8):100536. doi: 10.1016/j.patter.2022.100536. eCollection 2022 Aug 12.

Abstract

Single-cell technologies generate large, high-dimensional datasets encompassing a diversity of omics. Dimensionality reduction captures the structure and heterogeneity of the original dataset, creating low-dimensional visualizations that contribute to the human understanding of data. Existing algorithms are typically unsupervised, using measured features to generate manifolds, disregarding known biological labels such as cell type or experimental time point. We repurpose the classification algorithm, linear discriminant analysis (LDA), for supervised dimensionality reduction of single-cell data. LDA identifies linear combinations of predictors that optimally separate a priori classes, enabling the study of specific aspects of cellular heterogeneity. We implement feature selection by hybrid subset selection (HSS) and demonstrate that this computationally efficient approach generates non-stochastic, interpretable axes amenable to diverse biological processes such as differentiation over time and cell cycle. We benchmark HSS-LDA against several popular dimensionality-reduction algorithms and illustrate its utility and versatility for the exploration of single-cell mass cytometry, transcriptomics, and chromatin accessibility data.

Keywords: LDA; algorithms; cell cycle; dimensionality reduction; feature interpretation; feature selection; linear discriminant analysis; omics; single cell; trajectory; visualization.