Heavy chain sequence-based classifier for the specificity of human antibodies

Brief Bioinform. 2022 Jan 17;23(1):bbab516. doi: 10.1093/bib/bbab516.

Abstract

Antibodies specifically bind to antigens and are an essential part of the immune system. Hence, antibodies are powerful tools in research and diagnostics. High-throughput sequencing technologies have promoted comprehensive profiling of the immune repertoire, which has resulted in large amounts of antibody sequences that remain to be further analyzed. In this study, antibodies were downloaded from IMGT/LIGM-DB and Sequence Read Archive databases. Contributing features from antibody heavy chains were formulated as numerical inputs and fed into an ensemble machine learning classifier to classify the antigen specificity of six classes of antibodies, namely anti-HIV-1, anti-influenza virus, anti-pneumococcal polysaccharide, anti-citrullinated protein, anti-tetanus toxoid and anti-hepatitis B virus. The classifier was validated using cross-validation and a testing dataset. The ensemble classifier achieved a macro-average area under the receiver operating characteristic curve (AUC) of 0.9246 from the 10-fold cross-validation, and 0.9264 for the testing dataset. Among the contributing features, the contribution of the complementarity-determining regions was 53.1% and that of framework regions was 46.9%, and the amino acid mutation rates occupied the first and second ranks among the top five contributing features. The classifier and insights provided in this study could promote the mechanistic study, isolation and utilization of potential therapeutic antibodies.

Keywords: antibody; ensemble machine learning; heavy chain; specificity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence*
  • Antibodies / chemistry*
  • Antibody Specificity
  • Complementarity Determining Regions
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Machine Learning*
  • ROC Curve

Substances

  • Antibodies
  • Complementarity Determining Regions