The research aims to identify and characterize the antibody escape mutations of NTD and RBD regions of SARS-CoV-2 using prompt engineering-enabled combined LLMs (large language models) and instigative bioinformatics techniques. We used two LLMs (ChatGPT and Mistral 7B) and one MLLM (Gemini model) to retrieve the significant NTD and RBD mutations. The retrieved significant mutations were characterized through the in silico servers. The retrieved 15 NTD significant mutations (six deletions and nine-point mutations) and 17 RBD point mutations were noted. We further characterized them in terms of distribution, count, ΔΔG of mutation (ΔΔG stability mCSM, ΔΔGstability DUET, ΔΔGstabilitySDM) to understand the stabilized or destabilized mutation, interaction interface, distance to PPI interface, Δvibrational entropy energy (ΔΔSVib ENCoM), and change in the flexibility. Here, we analyzed every mutation's ΔΔG, interaction, and related parameters using the trimeric Spike protein complex. In NTD mutations, our five analyzed mutations show two destabilising (G142D, R190S) and three showing stabilising properties (D215G, A222V, and R246I). Some RBD mutations are noted as entirely destabilising (K417N, K417T, L452R, F490S). N440K, N460K, and Q493R show stabilising and neutral properties. Combined LLMs and instigative bioinformatics techniques were used to identify and characterize the antibody escape mutations. With our strategy, the LLM and MLLM can help to fight future pandemic viruses by quickly identifying mutations and their significance.
Keywords: Antibody escape; LLM; Mutations; Prompt engineering; SARS-CoV-2.
Copyright © 2024 Elsevier B.V. All rights reserved.