On the viability of unsupervised T-cell receptor sequence clustering for epitope preference

Bioinformatics. 2019 May 1;35(9):1461-1468. doi: 10.1093/bioinformatics/bty821.

Abstract

Motivation: The T-cell receptor (TCR) is responsible for recognizing epitopes presented on cell surfaces. Linking TCR sequences to their ability to target specific epitopes is currently an unsolved problem, yet one of great interest. Indeed, it is currently unknown how dissimilar TCR sequences can be before they no longer bind the same epitope. This question is confounded by the fact that there are many ways to define the similarity between two TCR sequences. Here we investigate both issues in the context of TCR sequence unsupervised clustering.

Results: We provide an overview of the performance of various distance metrics on two large independent datasets with 412 and 2835 TCR sequences respectively. Our results confirm the presence of structural distinct TCR groups that target identical epitopes. In addition, we put forward several recommendations to perform unsupervised T-cell receptor sequence clustering.

Availability and implementation: Source code implemented in Python 3 available at https://github.com/pmeysman/TCRclusteringPaper.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Epitopes
  • Receptors, Antigen, T-Cell / immunology*
  • Software*

Substances

  • Epitopes
  • Receptors, Antigen, T-Cell