The near-infrared spectral data is highly high dimensional and contains redundant information, it is necessary to identify the most representative characteristic wavelengths before modeling to improve model accuracy and reliability. At present, there are many methods for selecting the characteristic wavelengths of NIR spectroscopy, but the collinearity among wavelengths is still a main issue that leads to poor model effects. Therefore, this study proposes a three-stage wavelength selection algorithm (Stage III) to reduce redundancy in NIR spectral data and collinearity between wavelength variables, resulting in a simpler and more accurate predictive model. The research uses a public NIR data set of corn samples as its subject. Initially, the wavelengths with the higher correlation coefficients are chosen after calculating the relationship coefficients between every wavelength vector and the concentration vector. On this basis, the correlation coefficients between the vectors of each wavelength point are calculated, and those wavelength points with smaller correlation coefficients with other wavelength points are selected. Ultimately, the stepwise regression analysis selects the wavelengths that provide substantial value to the model as the variables for modeling, leading to the development of a multiple linear regression model. The results show that the model using the three-stage wavelength selection algorithm outperforms those using the full spectrum, Stages I and Stage II, and the coefficient of determination of the test set of the Stage III-MLR model achieved an accuracy of 0.9360. Instead of the successive projections algorithm (SPA), uninformative variable elimination (UVE), and competitive adaptive reweighted sampling (CARS), Stage III is better in the model prediction accuracy. Therefore, the three-stage wavelength selection algorithm is an effective wavelength selection algorithm that can effectively model NIR spectroscopy, reduce the collinearity between the wavelength variables, simplify the complexity of the model, and improve the prediction precision of the model.
Keywords: Correlation coefficient; Near-infrared spectroscopy; Stepwise regression; Wavelength selection.
Copyright © 2024 Elsevier B.V. All rights reserved.