Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review

Int J Mol Sci. 2023 Jul 15;24(14):11488. doi: 10.3390/ijms241411488.

Abstract

In modern drug discovery, the combination of chemoinformatics and quantitative structure-activity relationship (QSAR) modeling has emerged as a formidable alliance, enabling researchers to harness the vast potential of machine learning (ML) techniques for predictive molecular design and analysis. This review delves into the fundamental aspects of chemoinformatics, elucidating the intricate nature of chemical data and the crucial role of molecular descriptors in unveiling the underlying molecular properties. Molecular descriptors, including 2D fingerprints and topological indices, in conjunction with the structure-activity relationships (SARs), are pivotal in unlocking the pathway to small-molecule drug discovery. Technical intricacies of developing robust ML-QSAR models, including feature selection, model validation, and performance evaluation, are discussed herewith. Various ML algorithms, such as regression analysis and support vector machines, are showcased in the text for their ability to predict and comprehend the relationships between molecular structures and biological activities. This review serves as a comprehensive guide for researchers, providing an understanding of the synergy between chemoinformatics, QSAR, and ML. Due to embracing these cutting-edge technologies, predictive molecular analysis holds promise for expediting the discovery of novel therapeutic agents in the pharmaceutical sciences.

Keywords: AI/ML; QSAR; QSPR; SAR; biological activity; chemoinformatics; computational validation; molecular descriptors; predictive modeling; small molecules.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Cheminformatics*
  • Drug Discovery* / methods
  • Machine Learning
  • Molecular Structure
  • Quantitative Structure-Activity Relationship

Grants and funding

This research received no external funding.