Advancing Adverse Drug Reaction Prediction with Deep Chemical Language Model for Drug Safety Evaluation

Int J Mol Sci. 2024 Apr 20;25(8):4516. doi: 10.3390/ijms25084516.

Abstract

The accurate prediction of adverse drug reactions (ADRs) is essential for comprehensive drug safety evaluation. Pre-trained deep chemical language models have emerged as powerful tools capable of automatically learning molecular structural features from large-scale datasets, showing promising capabilities for the downstream prediction of molecular properties. However, the performance of pre-trained chemical language models in predicting ADRs, especially idiosyncratic ADRs induced by marketed drugs, remains largely unexplored. In this study, we propose MoLFormer-XL, a pre-trained model for encoding molecular features from canonical SMILES, in conjunction with a CNN-based model to predict drug-induced QT interval prolongation (DIQT), drug-induced teratogenicity (DIT), and drug-induced rhabdomyolysis (DIR). Our results demonstrate that the proposed model outperforms conventional models applied in previous studies for predicting DIQT, DIT, and DIR. Notably, an analysis of the learned linear attention maps highlights amines, alcohol, ethers, and aromatic halogen compounds as strongly associated with the three types of ADRs. These findings hold promise for enhancing drug discovery pipelines and reducing the drug attrition rate due to safety concerns.

Keywords: adverse drug reactions; deep chemical language model; deep learning; drug safety evaluation; structural alerts.

MeSH terms

  • Deep Learning
  • Drug-Related Side Effects and Adverse Reactions*
  • Humans
  • Long QT Syndrome / chemically induced
  • Models, Chemical
  • Rhabdomyolysis / chemically induced