A Reproducibility Focused Meta-Analysis Method for Single-Cell Transcriptomic Case-Control Studies Uncovers Robust Differentially Expressed Genes

bioRxiv [Preprint]. 2024 Oct 16:2024.10.15.618577. doi: 10.1101/2024.10.15.618577.

Abstract

Here we systematically studied the reproducibility of DEGs in previously published Alzheimer's Disease (AD), Parkinson's Disease (PD), and COVID-19 scRNA-seq studies. We found that while transcriptional scores created from differentially expressed genes (DEGs) in individual PD and COVID-19 datasets had moderate predictive power for the case control status of other datasets (mean AUC=0.77 and 0.75, respectively), genes from individual AD datasets had poor predictive power (mean AUC=0.68). We developed a non-parametric meta-analysis method, SumRank, based on reproducibility of relative differential expression ranks across datasets. The meta-analysis genes had improved predictive power (AUCs of 0.88, 0.91, and 0.78, respectively). By multiple other metrics, specificity and sensitivity of these genes were substantially higher than those discovered by dataset merging and inverse variance weighted p-value aggregation methods. The DEGs revealed known and novel biological pathways, and we validate the BCAT1 gene as down-regulated in oligodendrocytes in an AD mouse model. Our analyses show that for heterogeneous diseases, DEGs of individual studies often have low reproducibility, but combining information across multiple datasets promotes the rigorous discovery of reproducible DEGs.

Publication types

  • Preprint