Benchmarking of analytical combinations for COVID-19 outcome prediction using single-cell RNA sequencing data

Brief Bioinform. 2023 May 19;24(3):bbad159. doi: 10.1093/bib/bbad159.

Abstract

The advances of single-cell transcriptomic technologies have led to increasing use of single-cell RNA sequencing (scRNA-seq) data in large-scale patient cohort studies. The resulting high-dimensional data can be summarized and incorporated into patient outcome prediction models in several ways; however, there is a pressing need to understand the impact of analytical decisions on such model quality. In this study, we evaluate the impact of analytical choices on model choices, ensemble learning strategies and integrate approaches on patient outcome prediction using five scRNA-seq COVID-19 datasets. First, we examine the difference in performance between using single-view feature space versus multi-view feature space. Next, we survey multiple learning platforms from classical machine learning to modern deep learning methods. Lastly, we compare different integration approaches when combining datasets is necessary. Through benchmarking such analytical combinations, our study highlights the power of ensemble learning, consistency among different learning methods and robustness to dataset normalization when using multiple datasets as the model input.

Keywords: COVID-19; benchmark; disease outcome prediction; patient analysis; single-cell.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking*
  • COVID-19*
  • Gene Expression Profiling
  • Humans
  • Machine Learning
  • Sequence Analysis, RNA / methods