Genetic demultiplexing of pooled single-cell RNA-sequencing samples in cancer facilitates effective experimental design

Gigascience. 2021 Sep 22;10(9):giab062. doi: 10.1093/gigascience/giab062.

Abstract

Background: Pooling cells from multiple biological samples prior to library preparation within the same single-cell RNA sequencing experiment provides several advantages, including lower library preparation costs and reduced unwanted technological variation, such as batch effects. Computational demultiplexing tools based on natural genetic variation between individuals provide a simple approach to demultiplex samples, which does not require complex additional experimental procedures. However, to our knowledge these tools have not been evaluated in cancer, where somatic variants, which could differ between cells from the same sample, may obscure the signal in natural genetic variation.

Results: Here, we performed in silico benchmark evaluations by combining raw sequencing reads from multiple single-cell samples in high-grade serous ovarian cancer, which has a high copy number burden, and lung adenocarcinoma, which has a high tumor mutational burden. Our results confirm that genetic demultiplexing tools can be effectively deployed on cancer tissue using a pooled experimental design, although high proportions of ambient RNA from cell debris reduce performance.

Conclusions: This strategy provides significant cost savings through pooled library preparation. To facilitate similar analyses at the experimental design phase, we provide freely accessible code and a reproducible Snakemake workflow built around the best-performing tools found in our in silico benchmark evaluations, available at https://github.com/lmweber/snp-dmx-cancer.

Keywords: benchmarking; cancer; computational methods; genetic demultiplexing; high-grade serous ovarian cancer; lung adenocarcinoma; simulations; single-cell RNA sequencing; tumor mutational burden.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Gene Library
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Neoplasms* / genetics
  • RNA
  • Research Design*
  • Software

Substances

  • RNA