A Regularized Bayesian Dirichlet-multinomial Regression Model for Integrating Single-cell-level Omics and Patient-level Clinical Study Data

bioRxiv [Preprint]. 2024 Jun 6:2024.06.04.597391. doi: 10.1101/2024.06.04.597391.

Abstract

The abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases.

Keywords: Dirichlet-multinomial regression models; hierarchical tree; integrative analysis; single-cell RNA sequencing; spike-and-slap priors.

Publication types

  • Preprint