psupertime: supervised pseudotime analysis for time-series single-cell RNA-seq data

Will Macnair; Revant Gupta; Manfred Claassen

doi:10.1093/bioinformatics/btac227

psupertime: supervised pseudotime analysis for time-series single-cell RNA-seq data

Bioinformatics. 2022 Jun 24;38(Suppl 1):i290-i298. doi: 10.1093/bioinformatics/btac227.

Authors

Will Macnair¹, Revant Gupta², Manfred Claassen^{2

3}

Affiliations

¹ Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich 8093, Switzerland.
² Inner Medicine I, Faculty of Medicine, University of Tübingen, University Hospital Tübingen, 72074, Germany.
³ Department of Computer Science, University of Tübingen, Tübingen 72074, Germany.

Abstract

Motivation: Improvements in single-cell RNA-seq technologies mean that studies measuring multiple experimental conditions, such as time series, have become more common. At present, few computational methods exist to infer time series-specific transcriptome changes, and such studies have therefore typically used unsupervised pseudotime methods. While these methods identify cell subpopulations and the transitions between them, they are not appropriate for identifying the genes that vary coherently along the time series. In addition, the orderings they estimate are based only on the major sources of variation in the data, which may not correspond to the processes related to the time labels.

Results: We introduce psupertime, a supervised pseudotime approach based on a regression model, which explicitly uses time-series labels as input. It identifies genes that vary coherently along a time series, in addition to pseudotime values for individual cells, and a classifier that can be used to estimate labels for new data with unknown or differing labels. We show that psupertime outperforms benchmark classifiers in terms of identifying time-varying genes and provides better individual cell orderings than popular unsupervised pseudotime techniques. psupertime is applicable to any single-cell RNA-seq dataset with sequential labels (e.g. principally time series but also drug dosage and disease progression), derived from either experimental design and provides a fast, interpretable tool for targeted identification of genes varying along with specific biological processes.

Availability and implementation: R package available at github.com/wmacnair/psupertime and code for results reproduction at github.com/wmacnair/psupplementary.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Gene Expression Profiling / methods
RNA-Seq
Sequence Analysis, RNA / methods
Single-Cell Analysis* / methods
Software*
Time Factors
Transcriptome