Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis

Chemistry. 2023 Jan 27;29(6):e202202834. doi: 10.1002/chem.202202834. Epub 2022 Nov 27.

Abstract

Recent years have witnessed a boom of machine learning (ML) applications in chemistry, which reveals the potential of data-driven prediction of synthesis performance. Digitalization and ML modelling are the key strategies to fully exploit the unique potential within the synergistic interplay between experimental data and the robust prediction of performance and selectivity. A series of exciting studies have demonstrated the importance of chemical knowledge implementation in ML, which improves the model's capability for making predictions that are challenging and often go beyond the abilities of human beings. This Minireview summarizes the cutting-edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022. By merging organic synthesis tactics and chemical informatics, we hope this Review can provide a guide map and intrigue chemists to revisit the digitalization and computerization of organic chemistry principles.

Keywords: machine learning; molecular embedding; organic synthesis; performance prediction; synthetic dataset.

Publication types

  • Review