Prompt engineering-enabled LLM or MLLM and instigative bioinformatics pave the way to identify and characterize the significant SARS-CoV-2 antibody escape mutations

Chiranjib Chakraborty; Manojit Bhattacharya; Soumen Pal; Sang-Soo Lee

doi:10.1016/j.ijbiomac.2024.138547

Prompt engineering-enabled LLM or MLLM and instigative bioinformatics pave the way to identify and characterize the significant SARS-CoV-2 antibody escape mutations

Int J Biol Macromol. 2024 Dec 8:287:138547. doi: 10.1016/j.ijbiomac.2024.138547. Online ahead of print.

Authors

Chiranjib Chakraborty¹, Manojit Bhattacharya², Soumen Pal³, Sang-Soo Lee⁴

Affiliations

¹ Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal 700126, India. Electronic address: [email protected].
² Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore 756020, Odisha, India.
³ School of Mechanical Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
⁴ Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon, Gangwon-Do 24252, Republic of Korea.

PMID: 39657873
DOI: 10.1016/j.ijbiomac.2024.138547

Abstract

The research aims to identify and characterize the antibody escape mutations of NTD and RBD regions of SARS-CoV-2 using prompt engineering-enabled combined LLMs (large language models) and instigative bioinformatics techniques. We used two LLMs (ChatGPT and Mistral 7B) and one MLLM (Gemini model) to retrieve the significant NTD and RBD mutations. The retrieved significant mutations were characterized through the in silico servers. The retrieved 15 NTD significant mutations (six deletions and nine-point mutations) and 17 RBD point mutations were noted. We further characterized them in terms of distribution, count, ΔΔG of mutation (ΔΔG ^stability mCSM, ΔΔG^stability DUET, ΔΔG^stabilitySDM) to understand the stabilized or destabilized mutation, interaction interface, distance to PPI interface, Δvibrational entropy energy (ΔΔSVib ENCoM), and change in the flexibility. Here, we analyzed every mutation's ΔΔG, interaction, and related parameters using the trimeric Spike protein complex. In NTD mutations, our five analyzed mutations show two destabilising (G142D, R190S) and three showing stabilising properties (D215G, A222V, and R246I). Some RBD mutations are noted as entirely destabilising (K417N, K417T, L452R, F490S). N440K, N460K, and Q493R show stabilising and neutral properties. Combined LLMs and instigative bioinformatics techniques were used to identify and characterize the antibody escape mutations. With our strategy, the LLM and MLLM can help to fight future pandemic viruses by quickly identifying mutations and their significance.

Keywords: Antibody escape; LLM; Mutations; Prompt engineering; SARS-CoV-2.