Leveraging molecular structure and bioactivity with chemical language models for de novo drug design

Nat Commun. 2023 Jan 7;14(1):114. doi: 10.1038/s41467-022-35692-6.

Abstract

Generative chemical language models (CLMs) can be used for de novo molecular structure generation by learning from a textual representation of molecules. Here, we show that hybrid CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), a collection of virtual molecules was created with a generative CLM. This virtual compound library was refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ ligands. Several of the computer-generated molecular designs were commercially available, enabling fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified, highlighting the method's scaffold-hopping potential. Chemical synthesis and biochemical testing of two of the top-ranked de novo designed molecules and their derivatives corroborated the model's ability to generate PI3Kγ ligands with medium to low nanomolar activity for hit-to-lead expansion. The most potent compounds led to pronounced inhibition of PI3K-dependent Akt phosphorylation in a medulloblastoma cell model, demonstrating efficacy of PI3Kγ ligands in PI3K/Akt pathway repression in human tumor cells. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Drug Design
  • Humans
  • Ligands
  • Molecular Structure
  • Phosphatidylinositol 3-Kinase
  • Phosphatidylinositol 3-Kinases*
  • Proto-Oncogene Proteins c-akt*

Substances

  • Phosphatidylinositol 3-Kinases
  • Ligands
  • Proto-Oncogene Proteins c-akt
  • Phosphatidylinositol 3-Kinase