Identifying general reaction conditions by bandit optimization

Jason Y Wang; Jason M Stevens; Stavros K Kariofillis; Mai-Jan Tom; Dung L Golden; Jun Li; Jose E Tabora; Marvin Parasram; Benjamin J Shields; David N Primer; Bo Hao; David Del Valle; Stacey DiSomma; Ariel Furman; G Greg Zipp; Sergey Melnikov; James Paulson; Abigail G Doyle

doi:10.1038/s41586-024-07021-y

Identifying general reaction conditions by bandit optimization

Nature. 2024 Feb;626(8001):1025-1033. doi: 10.1038/s41586-024-07021-y. Epub 2024 Feb 28.

Authors

Jason Y Wang^{1

2}, Jason M Stevens³, Stavros K Kariofillis^#^{1

2

4}, Mai-Jan Tom^#², Dung L Golden^#³, Jun Li⁵, Jose E Tabora⁵, Marvin Parasram^{1

6}, Benjamin J Shields^{1

7}, David N Primer^{3

8}, Bo Hao⁹, David Del Valle⁵, Stacey DiSomma⁵, Ariel Furman⁵, G Greg Zipp¹⁰, Sergey Melnikov¹¹, James Paulson⁵, Abigail G Doyle^{12

13}

Affiliations

¹ Department of Chemistry, Princeton University, Princeton, NJ, USA.
² Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, USA.
³ Chemical Process Development, Bristol Myers Squibb, Summit, NJ, USA.
⁴ Department of Chemistry, Columbia University, New York, NY, USA.
⁵ Chemical Process Development, Bristol Myers Squibb, New Brunswick, NJ, USA.
⁶ Department of Chemistry, New York University, New York, NY, USA.
⁷ Molecular Structure and Design, Bristol Myers Squibb, Cambridge, MA, USA.
⁸ Loxo Oncology at Lilly, Louisville, CO, USA.
⁹ Janssen Research and Development, Spring House, PA, USA.
¹⁰ Discovery Synthesis, Bristol Myers Squibb, Princeton, NJ, USA.
¹¹ Spectrix Analytical Services, North Haven, CT, USA.
¹² Department of Chemistry, Princeton University, Princeton, NJ, USA. [email protected].
¹³ Department of Chemistry and Biochemistry, University of California, Los Angeles, CA, USA. [email protected].

^# Contributed equally.

PMID: 38418912
DOI: 10.1038/s41586-024-07021-y

Abstract

Reaction conditions that are generally applicable to a wide variety of substrates are highly desired, especially in the pharmaceutical and chemical industries^1-6. Although many approaches are available to evaluate the general applicability of developed conditions, a universal approach to efficiently discover these conditions during optimizations is rare. Here we report the design, implementation and application of reinforcement learning bandit optimization models^7-10 to identify generally applicable conditions by efficient condition sampling and evaluation of experimental feedback. Performance benchmarking on existing datasets statistically showed high accuracies for identifying general conditions, with up to 31% improvement over baselines that mimic state-of-the-art optimization approaches. A palladium-catalysed imidazole C-H arylation reaction, an aniline amide coupling reaction and a phenol alkylation reaction were investigated experimentally to evaluate use cases and functionalities of the bandit optimization model in practice. In all three cases, the reaction conditions that were most generally applicable yet not well studied for the respective reaction were identified after surveying less than 15% of the expert-designed reaction space.