Multiclass Synthetic Accessibility Prediction

Xinqi Li; Ryan Walsh; Waseem Abbas; Sergio Pascual-Diaz; Calum Hand; Rory Garland; Faiz Mohammad Khan; Nikhil Mohan Das; Vedant Desai; Mohamed AbouZleikha; Matthew A Clark

doi:10.1021/acs.jcim.4c01663

Multiclass Synthetic Accessibility Prediction

J Chem Inf Model. 2025 Jan 17. doi: 10.1021/acs.jcim.4c01663. Online ahead of print.

Authors

Affiliations

¹ X-Chem U.K., 1 Ashley Road, Altrincham, Cheshire WA14 2DT, U.K.
² X-Chem Canada, 4800 Rue Levy, Montreal QC H4R 2P1, Canada.
³ X-Chem Global HQ, 100 Beaver Street, Waltham, Massachusetts 02453, United States.

PMID: 39818777
DOI: 10.1021/acs.jcim.4c01663

Abstract

Evaluating synthetic accessibility of in silico molecules is an integral component of the drug discovery process. While the application of machine learning models to predict whether small molecules are easy or hard to synthesize has gained attention recently, predetermined thresholds and data set imbalances present challenges for these binary classification approaches. In this study, we introduce a novel multiclass fold-ensembled classification approach to predict the minimum number of steps needed to synthesize a small molecule. By ensembling the base models trained on multiple stratified subsampled folds, this approach effectively mitigates the impact of class imbalance through probability aggregation or voting aggregation strategies. Additionally, we propose fuzzy evaluation metrics that account for practical tolerances in predictions, providing a more flexible and realistic assessment of model performance. Through experimentation on two reaction benchmark data sets, we demonstrate the effectiveness of our model in a multiclass synthetic accessibility prediction task and the superiority of our proposed method over six existing models in binary synthetic accessibility prediction tasks.