Machine learning-based approaches to Vis-NIR data for the automated characterization of petroleum wax blends

Spectrochim Acta A Mol Biomol Spectrosc. 2024 Apr 5:310:123910. doi: 10.1016/j.saa.2024.123910. Epub 2024 Jan 18.

Abstract

Petroleum waxes are products derived from lubricating oils with a wide spectrum of industrial and consumer applications that depend on their composition. In addition, the intended applications of this product are also subject to the practice of blending petroleum waxes with different chemical characteristics (e.g., paraffin waxes and microwaxes) to achieve the appropriate physicochemical properties. This study introduces a novel method based on visible and near-infrared spectroscopy (Vis-NIR) combined with machine learning (ML) for the characterization of blends of the two types of commonly marketed petroleum waxes (paraffin waxes and microwaxes). With spectroscopic data, Partial Least Squared Regression (PLSR), Support Vector Regression (SVR), and Random Forest (RF) Regression-based regression ML models have been developed, obtaining satisfactory results for the characterization of the percentage of blending in petroleum waxes. Moreover, strategies using wrapper variable selection methods like the Boruta algorithm and Genetic Algorithm (GA) have been implemented to assess if fewer predictors enhance model performance. Particularly, the application of wrapper variable selection methods, specifically the Boruta algorithm, has led to an improvement in the performance of the models obtained. Results obtained by the Boruta-PLS model showed the best performance with an RMSE of 2.972 and an R2 of 0.9925 for the test set and an RMSE of 1.814 and an R2 of 0.9977 for the external validation set. Additionally, this model allowed for establishing the relative importance of the variables in the characterization of the waxes mixture, pointing out that the hydrocarbon content ratio is critical in the determination of this value. An interactive web application was developed using the best model developed for easy processing of the data by the users.

Keywords: Boruta algorithm; Genetic Algorithm; Machine learning; Paraffins; Petroleum waxes; Visible-near infrared spectroscopy.