ConoDL: a deep learning framework for rapid generation and prediction of conotoxins

J Comput Aided Mol Des. 2024 Dec 26;39(1):4. doi: 10.1007/s10822-024-00582-0.

Abstract

Conotoxins, being small disulfide-rich and bioactive peptides, manifest notable pharmacological potential and find extensive applications. However, the exploration of conotoxins' vast molecular space using traditional methods is severely limited, necessitating the urgent need of developing novel approaches. Recently, deep learning (DL)-based methods have advanced to the molecular generation of proteins and peptides. Nevertheless, the limited data and the intricate structure of conotoxins constrain the application of deep learning models in the generation of conotoxins. We propose ConoDL, a framework for the generation and prediction of conotoxins, comprising the end-to-end conotoxin generation model (ConoGen) and the conotoxin prediction model (ConoPred). ConoGen employs transfer learning and a large language model (LLM) to tackle the challenges in conotoxin generation. Meanwhile, ConoPred filters artificial conotoxins generated by ConoGen, narrowing down the scope for subsequent research. A comprehensive evaluation of the peptide properties at both sequence and structure levels indicates that the artificial conotoxins generated by ConoDL exhibit a certain degree of similarity to natural conotoxins. Furthermore, ConoDL has generated artificial conotoxins with novel cysteine scaffolds. Therefore, ConoDL may uncover new cysteine scaffolds and conotoxin molecules, facilitating further exploration of the molecular space of conotoxins and the discovery of pharmacologically active variants.

Keywords: Conotoxins; Drug Discovery; Generative and predictive models; Large Language models; Pre-training models and transfer learning.

MeSH terms

  • Amino Acid Sequence
  • Conotoxins* / chemistry
  • Conotoxins* / pharmacology
  • Deep Learning*
  • Models, Molecular

Substances

  • Conotoxins

Grants and funding