Machine-learning-optimized Cas12a barcoding enables the recovery of single-cell lineages and transcriptional profiles

Mol Cell. 2022 Aug 18;82(16):3103-3118.e8. doi: 10.1016/j.molcel.2022.06.001. Epub 2022 Jun 24.

Abstract

The development of CRISPR-based barcoding methods creates an exciting opportunity to understand cellular phylogenies. We present a compact, tunable, high-capacity Cas12a barcoding system called dual acting inverted site array (DAISY). We combined high-throughput screening and machine learning to predict and optimize the 60-bp DAISY barcode sequences. After optimization, top-performing barcodes had ∼10-fold increased capacity relative to the best random-screened designs and performed reliably across diverse cell types. DAISY barcode arrays generated ∼12 bits of entropy and ∼66,000 unique barcodes. Thus, DAISY barcodes-at a fraction of the size of Cas9 barcodes-achieved high-capacity barcoding. We coupled DAISY barcoding with single-cell RNA-seq to recover lineages and gene expression profiles from ∼47,000 human melanoma cells. A single DAISY barcode recovered up to ∼700 lineages from one parental cell. This analysis revealed heritable single-cell gene expression and potential epigenetic modulation of memory gene transcription. Overall, Cas12a DAISY barcoding is an efficient tool for investigating cell-state dynamics.

Keywords: CRISPR barcoding; Cas12a; PRC2; high throughput screening; lineage tracking; machine learning; melanoma; online learning optimization; single cell genomics; transcriptional memory.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • CRISPR-Cas Systems*
  • Cell Lineage / genetics
  • DNA Barcoding, Taxonomic* / methods
  • Humans
  • Machine Learning
  • Phylogeny