Purpose: Drug development in oncology currently is facing a conjunction of an increasing number of antineoplastic agents (ANAs) candidate for phase I clinical trials (P1CTs) and an important attrition rate for final approval. We aimed to develop a machine learning algorithm (RESOLVED2) to predict drug development outcome, which could support early go/no-go decisions after P1CTs by better selection of drugs suitable for further development.
Methods: PubMed abstracts of P1CTs reporting on ANAs were used together with pharmacologic data from the DrugBank5.0 database to model time to US Food and Drug Administration (FDA) approval (FDA approval-free survival) since the first P1CT publication. The RESOLVED2 model was trained with machine learning methods. Its performance was evaluated on an independent test set with weighted concordance index (IPCW).
Results: We identified 462 ANAs from PubMed that matched with DrugBank5.0 (P1CT publication dates 1972 to 2017). Among 1,411 variables, 28 were used by RESOLVED2 to model the FDA approval-free survival, with an IPCW of 0.89 on the independent test set. RESOLVED2 outperformed a model that was based on efficacy/toxicity (IPCW, 0.69). In the test set at 6 years of follow-up, 73% (95% CI, 49% to 86%) of drugs predicted to be approved were approved, whereas 92% (95% CI, 87% to 98%) of drugs predicted to be nonapproved were still not approved (log-rank P < .001). A predicted approved drug was 16 times more likely to be approved than a predicted nonapproved drug (hazard ratio, 16.4; 95% CI, 8.40 to 32.2).
Conclusion: As soon as P1CT completion, RESOLVED2 can predict accurately the time to FDA approval. We provide the proof of concept that drug development outcome can be predicted by machine learning strategies.