Understanding drug-target interactions is crucial for identifying novel lead compounds, enhancing efficacy, and reducing toxicity. Phenotype-based approaches, like analyzing drug-induced gene expression changes, have shown effectiveness in drug discovery and precision medicine. However, experimentally determining gene expression for all relevant chemicals is impractical, limiting large-scale gene expression-based screening. In this study, we developed DIGERA (Drug-Induced Gene Expression Ranking Analysis), a Lasso-based ensemble framework utilizing LINCS L1000 data to predict drug-induced gene expression rankings. We created novel numerical features for chemicals, cell lines, and experimental conditions, allowing the prediction of gene expression rankings across eight key cell lines. DIGERA outperformed baseline models in the F1@K metric, demonstrating improved precision in gene expression ranking. We also combined DIGERA with an iterative fine-tuning process for de novo design, suggesting 10 PARP1 inhibitors with favorable predicted properties like binding affinity, synthetic accessibility, solubility, membrane permeability, drug-likeness, and similar gene expression ranking to olaparib. Notably, nine compounds were novel, and six analogs of these compounds had references linked to PARP1 inhibition. These results underscore DIGERA's potential to boost model performance and robustness through novel features and ensemble learning, aiding virtual screening for new PARP1 inhibitors.
Keywords: drug-induced gene expression; ensemble learning; machine learning; poly (ADP-ribose) polymerase 1; virtual screening.