Effect of Flattened Structures of Molecules and Materials on Machine Learning Model Training

J Chem Inf Model. 2023 Sep 11;63(17):5446-5456. doi: 10.1021/acs.jcim.3c00242. Epub 2023 Aug 25.

Abstract

A key aspect of producing accurate and reliable machine learning models for the prediction of properties of quantum chemistry (QC) data is identifying possible data characteristics that may negatively influence model training. In previous work, we identified that molecules and materials with a low volume of the convex hull (VCH) of atomic positions may be harmful in model training and a source of prediction outliers. In this paper, we extend this analysis further and develop a biased sampling study to evaluate the influence of VCH on the training data of a model using different structures of molecules and materials. Our study confirms that VCH influences model training and shows the importance of using homogeneous geometric characteristics of structures when building new data sets or selecting training sets from larger QC data sets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning*
  • Molecular Structure*