Segmentation aware probabilistic phenotyping of single-cell spatial protein expression data

Nat Commun. 2025 Jan 4;16(1):389. doi: 10.1038/s41467-024-55214-w.

Abstract

Spatial protein expression technologies can map cellular content and organization by simultaneously quantifying the expression of >40 proteins at subcellular resolution within intact tissue sections and cell lines. However, necessary image segmentation to single cells is challenging and error prone, easily confounding the interpretation of cellular phenotypes and cell clusters. To address these limitations, we present STARLING, a probabilistic machine learning model designed to quantify cell populations from spatial protein expression data while accounting for segmentation errors. To evaluate performance, we develop a comprehensive benchmarking workflow by generating highly multiplexed imaging data of cell line pellet standards with controlled cell content and marker expression and additionally established a score to quantify the biological plausibility of discovered cellular phenotypes on patient-derived tissue sections. Moreover, we generate spatial expression data of the human tonsil-a densely packed tissue prone to segmentation errors-and demonstrate cellular states captured by STARLING identify known cell types not visible with other methods and enable quantification of intra- and inter- individual heterogeneity.

MeSH terms

  • Cell Line
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Machine Learning*
  • Palatine Tonsil / cytology
  • Palatine Tonsil / metabolism
  • Phenotype*
  • Single-Cell Analysis* / methods