The Observed T Cell Receptor Space database enables paired-chain repertoire mining, coherence analysis, and language modeling

Cell Rep. 2024 Sep 24;43(9):114704. doi: 10.1016/j.celrep.2024.114704. Epub 2024 Aug 29.

Abstract

T cell activation is governed through T cell receptors (TCRs), heterodimers of two sequence-variable chains (often an α and β chain) that synergistically recognize antigen fragments presented on cell surfaces. Despite this, there only exist repositories dedicated to collecting single-chain, not paired-chain, TCR sequence data. We addressed this gap by creating the Observed TCR Space (OTS) database, a source of consistently processed and annotated, full-length, paired-chain TCR sequences. Currently, OTS contains 5.35 million redundant (1.63 million non-redundant), predominantly human sequences from across 50 studies and at least 75 individuals. Using OTS, we identify pairing biases, public TCRs, and distinct chain coherence patterns relative to antibodies. We also release a paired-chain TCR language model, providing paired embedding representations and a method for residue in-filling conditional on the partner chain. OTS will be updated as a central community resource and is freely downloadable and available as a web application.

Keywords: CP: Immunology; T cell receptor, TCR; alpha; beta; coherence; language model; paired sequences; repertoire.

MeSH terms

  • Data Mining / methods
  • Humans
  • Receptors, Antigen, T-Cell* / immunology
  • Receptors, Antigen, T-Cell* / metabolism

Substances

  • Receptors, Antigen, T-Cell