Reaction conditions that are generally applicable to a wide variety of substrates are highly desired, especially in the pharmaceutical and chemical industries1-6. Although many approaches are available to evaluate the general applicability of developed conditions, a universal approach to efficiently discover these conditions during optimizations is rare. Here we report the design, implementation and application of reinforcement learning bandit optimization models7-10 to identify generally applicable conditions by efficient condition sampling and evaluation of experimental feedback. Performance benchmarking on existing datasets statistically showed high accuracies for identifying general conditions, with up to 31% improvement over baselines that mimic state-of-the-art optimization approaches. A palladium-catalysed imidazole C-H arylation reaction, an aniline amide coupling reaction and a phenol alkylation reaction were investigated experimentally to evaluate use cases and functionalities of the bandit optimization model in practice. In all three cases, the reaction conditions that were most generally applicable yet not well studied for the respective reaction were identified after surveying less than 15% of the expert-designed reaction space.
© 2024. The Author(s), under exclusive licence to Springer Nature Limited.