Dual-Stage Stacking Machine Learning Method Considering Virtual Sample Generation for the Prediction of ZIF-8' BET Specific Surface Area with Experimental Validation

Langmuir. 2025 Jan 17. doi: 10.1021/acs.langmuir.4c04088. Online ahead of print.

Abstract

The widespread application of metal-organic frameworks (MOFs) in wastewater and gas treatment has created an increasing demand for accurate and rapid assessment of their BET specific surface area. However, experimental methods for acquiring sufficient statistical data are often costly and time-consuming. Therefore, this study proposes a dual-stage stacking model with Gaussian mixture model-virtual sample generation (GMM-VSG) technology for the BET specific surface area prediction. In this study, 90 real samples were selected from the MOF database and 300 virtual samples were generated. The performance on both real and virtual samples was evaluated by using four machine learning models, including Bayesian regression (Bayes), adaptive boosting (AdaBoost), random forest (RF), and extreme gradient boosting (XGBoost). Subsequently, three best-performing models and a linear regression model were selected for constructing a two-stage stacking model, with R2 value of 0.974. Finally, experimental conditions were adjusted based on feature importance analysis during the validation process, and the result shows that the prediction accuracy of the BET specific surface area is 0.943. This study contributes to the development of more efficient and accurate evaluation methods.