Population-level integration of single-cell datasets enables multi-scale analysis across samples

Nat Methods. 2023 Nov;20(11):1683-1692. doi: 10.1038/s41592-023-02035-2. Epub 2023 Oct 9.

Abstract

The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.

MeSH terms

  • Chromatin / genetics
  • Genomics*
  • Humans
  • Leukocytes, Mononuclear*
  • Single-Cell Analysis

Substances

  • Chromatin