A prediction model for electrical strength of gaseous medium based on molecular reactivity descriptors and machine learning method

J Mol Model. 2025 Jan 18;31(2):53. doi: 10.1007/s00894-024-06254-y.

Abstract

Context: Ionization and adsorption in gas discharge are similar to electrophilic and nucleophilic reactions. The molecular descriptors characterizing reactions such as electrostatic potential descriptors are useful in predicting the electrical strength of environmentally friendly gases. In this study, descriptors of 73 molecules are employed for correlation analysis with electrical strength. These molecular descriptors are categorized into two types: area-related descriptors and reactivity-related descriptors. Furthermore, the predictive performance between statistical models and machine learning models is compared. The statistical models include multiple linear regression, and polynomial regression, while machine learning models consist of K-nearest neighbors, random forest, and gradient boosting decision trees. The results indicate that machine learning models are generally better than statistical models in terms of predictive accuracy and stability, with gradient boosting decision trees demonstrating the best performance. Specifically, the coefficient of determination and mean squared error on the testing set after 1000 training iterations are 0.864 and 0.105, respectively. Therefore, the application of molecular reactivity descriptors and machine learning methods can effectively predict the electrical strength of gaseous medium.

Methods: The Gaussian 16 software is employed to optimize the molecular structure with the M06-2X functional and def2 series basis sets in this study. Then, the Multiwfn is utilized for wavefunction analysis to obtain molecular surface descriptors.

Keywords: Electrical strength; Machine learning; Molecular descriptor; Sulfur hexafluoride (SF6) alternatives.