Understanding the evolution of a de novo molecule generator via characteristic functional group monitoring

Sci Technol Adv Mater. 2022 Jun 1;23(1):352-360. doi: 10.1080/14686996.2022.2075240. eCollection 2022.

Abstract

Recently, artificial intelligence (AI)-enabled de novo molecular generators (DNMGs) have automated molecular design based on data-driven or simulation-based property estimates. In some domains like the game of Go where AI surpassed human intelligence, humans are trying to learn from AI about the best strategy of the game. To understand DNMG's strategy of molecule optimization, we propose an algorithm called characteristic functional group monitoring (CFGM). Given a time series of generated molecules, CFGM monitors statistically enriched functional groups in comparison to the training data. In the task of absorption wavelength maximization of pure organic molecules (consisting of H, C, N, and O), we successfully identified a strategic change from diketone and aniline derivatives to quinone derivatives. In addition, CFGM led us to a hypothesis that 1,2-quinone is an unconventional chromophore, which was verified with chemical synthesis. This study shows the possibility that human experts can learn from DNMGs to expand their ability to discover functional molecules.

Keywords: De novo molecule generation; characteristic functional group monitoring; chromophore; deep learning.

Grants and funding

This study was partially supported by a project subsidised by the New Energy and Industrial Technology Development Organization (NEDO) and MEXT as Priority Issue on Post-K Computer“(Building Innovative Drug Discovery Infrastructure through Functional Control of Biomolecular Systems) and Program for Promoting Studies on the Supercomputer Fugaku” (MD-driven Precision Medicine), AMED JP20nk0101111, SIP (Technologies for Smart Bio-industry and Agriculture), JST ERATO JPMJER1903, the Core Research for Evolutional Science and Technology (CREST) program of the Japan Science and Technology Agency (JST), Japan, under Grant JPMJCR19J3. The computations in this study were performed in the supercomputer centres at the NIMS and RAIDEN of AIP (RIKEN).