Antigen-specific T cell receptor (TCR) sequences can have prognostic, predictive, and therapeutic value, but decoding the specificity of TCR recognition remains challenging. Unlike DNA strands that base pair, TCRs bind to their targets with different orientations and different lengths, which complicates comparisons. We present scanning parametrized by normalized TCR length (SPAN-TCR) to analyze antigen-specific TCR CDR3 sequences and identify patterns driving TCR-pMHC specificity. Using entropic analysis, SPAN-TCR identifies 2-mer motifs that decrease the diversity (entropy) of CDR3s. These motifs are the most common patterns that can predict CDR3 composition, and we identify "essential" motifs that decrease entropy in the same CDR3 α or β chain containing the 2-mer, and "super-essential" motifs that decrease entropy in both chains. Molecular dynamics analysis further suggests that these motifs may play important roles in binding. We then employ SPAN-TCR to resolve similarities in TCR repertoires against different antigens using public databases of TCR sequences.
Keywords: CDR3 domains; SARS-COV-2; T cell receptors; T cells; antigens.
Copyright © 2023 Elsevier Inc. All rights reserved.