Private information leakage from single-cell count matrices

Cell. 2024 Nov 14;187(23):6537-6549.e10. doi: 10.1016/j.cell.2024.09.012. Epub 2024 Oct 2.

Abstract

The increase in publicly available human single-cell datasets, encompassing millions of cells from many donors, has significantly enhanced our understanding of complex biological processes. However, the accessibility of these datasets raises significant privacy concerns. Due to the inherent noise in single-cell measurements and the scarcity of population-scale single-cell datasets, recent private information quantification studies have focused on bulk gene expression data sharing. To address this gap, we demonstrate that individuals in single-cell gene expression datasets are vulnerable to linking attacks, where attackers can infer their sensitive phenotypic information using publicly available tissue or cell-type-specific expression quantitative trait loci (eQTLs) information. We further develop a method for genotype prediction and genotype-phenotype linking that remains effective without relying on eQTL information. We show that variants from one study can be exploited to uncover private information about individuals in another study.

Keywords: genome privacy; linking attack; single-cell gene expression.

MeSH terms

  • Cell Count
  • Genotype
  • Humans
  • Phenotype
  • Quantitative Trait Loci*
  • Single-Cell Analysis* / methods