Velociraptor: Cross-Platform Quantitative Search Using Hallmark Cell Features

bioRxiv [Preprint]. 2024 May 4:2024.05.01.591375. doi: 10.1101/2024.05.01.591375.

Abstract

A key challenge for single cell discovery analysis is to identify new cell types, describe them quantitatively, and seek these novel cells in new studies often using a different platform. Over the last decade, tools were developed to address identification and quantitative description of cells in human tissues and tumors. However, automated validation of populations at the single cell level has struggled due to the cytometry field's reliance on hierarchical, ordered use of features and on platform-specific rules for data processing and analysis. Here we present Velociraptor, a workflow that implements Marker Enrichment Modeling in three cross-platform modules: 1) identification of cells specific to disease states, 2) description of hallmark features for each cell and population, and 3) searching for cells matching one or more hallmark feature sets in a new dataset. A key advance is that Velociraptor registers cells between datasets, including between flow cytometry and quantitative imaging using different, overlapping feature sets. Four datasets were used to challenge Velociraptor and reveal new biological insights. Working at the individual sample level, Velociraptor tracked the abundance of clinically significant glioblastoma brain tumor cell subsets and characterized the cells that predominate in recurrent tumors as a close match for rare, negative prognostic cells originally observed in matched pre-treatment tumors. In patients with inborn errors of immunity, Velociraptor identified genotype-specific cells associated with GATA2 haploinsufficiency. Finally, in cross-platform analysis of immune cells in multiplex imaging of breast cancer, Velociraptor sought and correctly identified memory T cell subsets in tumors. Different phenotypic descriptions generated by algorithms or humans were shown to be effective as search inputs, indicating that cell identity need not be described in terms of per-feature cutoffs or strict hierarchical analyses. Velociraptor thus identifies cells based on hallmark feature sets, such as protein expression signatures, and works effectively with data from multiple sources, including suspension flow cytometry, imaging, and search text based on known or theoretical cell features.

Publication types

  • Preprint