A novel hybrid filter/wrapper method for feature selection in archaeological ceramics classification by laser-induced breakdown spectroscopy

Analyst. 2021 Feb 7;146(3):1023-1031. doi: 10.1039/d0an02045a. Epub 2020 Dec 10.

Abstract

Laser-induced breakdown spectroscopy (LIBS) has been appreciated as a valuable analytical tool in the cultural heritage field owing to its unique technological superiority, particularly in combination with chemometric methods. Feature selection (FS) as an indispensable pre-processing step in data optimization, for eliminating the redundant or irrelevant features from high-dimensional data to enhance the predictive capacity and result comprehensibility of multivariate classification based on LIBS technology. In this paper, a novel hybrid filter/wrapper method based on the MI-DBS algorithm was proposed to enhance the qualitative analysis performance of the LIBS technique. The proposed method combines the advantages of the mutual information (MI) algorithm based filter method and bi-directional selection (DBS) algorithm based wrapper method. The MI algorithm is the first to remove the redundant or uncorrelated features so that a simplified input subset can be established. Then, the DBS algorithm is used to further select the retained features and hence to seek an optimal feature subset with good predictive performance. To benefit the above feature selection process, the wavelet transform denoising (WTD) method was used to reduce the noise from LIBS spectra. LIBS experiments were performed using 35 archaeological ceramic samples. Besides, the proposed hybrid filter/wrapper method was implemented through a random forest (RF) based nonlinear multivariate classification method. Through a comparison between several other feature selection methods and the proposed method, it has been seen that the proposed method is the best regarding the predictive performance and number of the selected features. Finally, the MI-DBS algorithm is used to seek the optimal features from the full spectrum (220-720 nm); the corresponding sensitivity, specificity and accuracy acquired through the RF classifier for the test set were 0.9722, 0.9956 and 0.9850. It is shown from the general results that the MI-DBS algorithm is more effective in terms of improving the model performance and decreasing the redundant or uncorrelated features and computational time and serves as a good alternative for FS in multivariate classification.