The building of multivariate calibration models using near-infrared spectroscopy (NIR) and partial least squares (PLS) to estimate the lignin content in different parts of sugarcane genotypes is presented. Laboratory analyses were performed to determine the lignin content using the Klason method. The independent variables were obtained from different materials: dry bagasse, bagasse-with-juice, leaf, and stalk. The NIR spectra in the range of 10 000-4000 cm-1 were obtained directly for each material. The models were built using PLS regression, and different algorithms for variable selection were tested and compared: iPLS, biPLS, genetic algorithm (GA), and the ordered predictors selection method (OPS). The best models were obtained by feature selection with the OPS algorithm. The values of the root mean square error prediction (RMSEP), correlation of prediction ( RP), and ratio of performance to deviation (RPD) were, respectively, for dry bagasse equal to 0.85, 0.97, and 2.87; for bagasse-with-juice equal to 0.65, 0.94, and 2.77; for leaf equal to 0.58, 0.96, and 2.56; for the middle stalk equal to 0.61, 0.95, and 3.24; and for the top stalk equal to 0.58, 0.96, and 2.34. The OPS algorithm selected fewer variables, with greater predictive capacity. All the models are reliable, with high accuracy for predicting lignin in sugarcane, and significantly reduce the time to perform the analysis, the cost and the chemical reagent consumption, thus optimizing the entire process. In general, the future application of these models will have a positive impact on the biofuels industry, where there is a need for rapid decision-making regarding clone production and genetic breeding program.
Keywords: PLS; Sugarcane; lignin; partial least squares regression; stalk; variable selection.