Interpretable deep learning reveals the role of an E-box motif in suppressing somatic hypermutation of AGCT motifs within human immunoglobulin variable regions

Front Immunol. 2024 May 28:15:1407470. doi: 10.3389/fimmu.2024.1407470. eCollection 2024.

Abstract

Introduction: Somatic hypermutation (SHM) of immunoglobulin variable (V) regions by activation induced deaminase (AID) is essential for robust, long-term humoral immunity against pathogen and vaccine antigens. AID mutates cytosines preferentially within WRCH motifs (where W=A or T, R=A or G and H=A, C or T). However, it has been consistently observed that the mutability of WRCH motifs varies substantially, with large variations in mutation frequency even between multiple occurrences of the same motif within a single V region. This has led to the notion that the immediate sequence context of WRCH motifs contributes to mutability. Recent studies have highlighted the potential role of local DNA sequence features in promoting mutagenesis of AGCT, a commonly mutated WRCH motif. Intriguingly, AGCT motifs closer to 5' ends of V regions, within the framework 1 (FW1) sub-region1, mutate less frequently, suggesting an SHM-suppressing sequence context.

Methods: Here, we systematically examined the basis of AGCT positional biases in human SHM datasets with DeepSHM, a machine-learning model designed to predict SHM patterns. This was combined with integrated gradients, an interpretability method, to interrogate the basis of DeepSHM predictions.

Results: DeepSHM predicted the observed positional differences in mutation frequencies at AGCT motifs with high accuracy. For the conserved, lowly mutating AGCT motifs in FW1, integrated gradients predicted a large negative contribution of 5'C and 3'G flanking residues, suggesting that a CAGCTG context in this location was suppressive for SHM. CAGCTG is the recognition motif for E-box transcription factors, including E2A, which has been implicated in SHM. Indeed, we found a strong, inverse relationship between E-box motif fidelity and mutation frequency. Moreover, E2A was found to associate with the V region locale in two human B cell lines. Finally, analysis of human SHM datasets revealed that naturally occurring mutations in the 3'G flanking residues, which effectively ablate the E-box motif, were associated with a significantly increased rate of AGCT mutation.

Discussion: Our results suggest an antagonistic relationship between mutation frequency and the binding of E-box factors like E2A at specific AGCT motif contexts and, therefore, highlight a new, suppressive mechanism regulating local SHM patterns in human V regions.

Keywords: E-box transcription factors; E2A; activation induced deaminase (AID); deep learning; immunoglobulin heavy chain; integrated gradients; somatic hypermutation (SHM).

MeSH terms

  • Amino Acid Motifs
  • Cytidine Deaminase / genetics
  • Cytidine Deaminase / metabolism
  • Deep Learning*
  • Humans
  • Immunoglobulin Variable Region* / genetics
  • Mutation
  • Nucleotide Motifs*
  • Somatic Hypermutation, Immunoglobulin* / genetics

Substances

  • Immunoglobulin Variable Region
  • AICDA (activation-induced cytidine deaminase)
  • Cytidine Deaminase

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by grant NIH R01AI132507 to TM. The IMP is core funded by Boehringer Ingelheim. The funders had no role in study design, data collection, and interpretation or the decision to submit the work for publication.