Augmented drug combination dataset to improve the performance of machine learning models predicting synergistic anticancer effects

Mengmeng Liu; Gopal Srivastava; J Ramanujam; Michal Brylinski

doi:10.1038/s41598-024-51940-9

Augmented drug combination dataset to improve the performance of machine learning models predicting synergistic anticancer effects

Sci Rep. 2024 Jan 18;14(1):1668. doi: 10.1038/s41598-024-51940-9.

Authors

Mengmeng Liu^#¹, Gopal Srivastava^#², J Ramanujam^{1

3}, Michal Brylinski^{4

5}

Affiliations

¹ Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA.
² Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA.
³ Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA.
⁴ Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA. [email protected].
⁵ Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, 70803, USA. [email protected].

^# Contributed equally.

Abstract

Combination therapy has gained popularity in cancer treatment as it enhances the treatment efficacy and overcomes drug resistance. Although machine learning (ML) techniques have become an indispensable tool for discovering new drug combinations, the data on drug combination therapy currently available may be insufficient to build high-precision models. We developed a data augmentation protocol to unbiasedly scale up the existing anti-cancer drug synergy dataset. Using a new drug similarity metric, we augmented the synergy data by substituting a compound in a drug combination instance with another molecule that exhibits highly similar pharmacological effects. Using this protocol, we were able to upscale the AZ-DREAM Challenges dataset from 8798 to 6,016,697 drug combinations. Comprehensive performance evaluations show that ML models trained on the augmented data consistently achieve higher accuracy than those trained solely on the original dataset. Our data augmentation protocol provides a systematic and unbiased approach to generating more diverse and larger-scale drug combination datasets, enabling the development of more precise and effective ML models. The protocol presented in this study could serve as a foundation for future research aimed at discovering novel and effective drug combinations for cancer treatment.

MeSH terms

Computational Biology* / methods
Drug Combinations
Drug Synergism
Drug Therapy, Combination
Machine Learning*

Substances

Drug Combinations

Abstract

MeSH terms

Substances

Grants and funding