Conditional Similarity Triplets Enable Covariate-Informed Representations of Single-Cell Data

Chi-Jane Chen; Haidong Yi; Natalie Stanley

doi:10.21203/rs.3.rs-4915088/v1

Conditional Similarity Triplets Enable Covariate-Informed Representations of Single-Cell Data

Res Sq [Preprint]. 2024 Sep 12:rs.3.rs-4915088. doi: 10.21203/rs.3.rs-4915088/v1.

Authors

Chi-Jane Chen¹, Haidong Yi¹, Natalie Stanley²

Affiliations

¹ Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
² Department of Computer Science and Computational Medicine Program, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Abstract

Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per cell. In order to translate data from immune profiling assays into powerful diagnostics, machine learning approaches are used to compute per-sample immunological summaries, or featurizations that can be used as inputs to models for outcomes of interest. Current supervised learning approaches for computing per-sample representations are optimized based only on the outcome variable to be predicted and do not take into account clinically-relevant covariates that are likely to also be measured. Here we expand the optimization problem to also take into account such additional patient covariates to directly inform the learned per-sample representations. To do this, we introduce CytoCoSet, a set-based encoding method, which formulates a loss function with an additional triplet term penalizing samples with similar covariates from having disparate embedding results in per-sample representations. Overall, incorporating clinical covariates leads to improved prediction of clinical phenotypes.

Keywords: clinical prediction; immune profiling; single-cell.

Publication types

Preprint

Grants and funding

R21 AI171745/AI/NIAID NIH HHS/United States