Amazonian cacao-clone nibs discrimination using NIR spectroscopy coupled to naïve Bayes classifier and a new waveband selection approach

Spectrochim Acta A Mol Biomol Spectrosc. 2022 Apr 5:270:120815. doi: 10.1016/j.saa.2021.120815. Epub 2021 Dec 29.

Abstract

Near-Infrared Spectroscopy (NIRS) has shown to be helpful in the study of rice, tea, cocoa, and other foods due to its versatility and reduced sample treatment. However, the high complexity of the data produced by NIR sensors makes necessary pre-treatments such as feature selection techniques that produce compact profiles. Supervised and unsupervised techniques have been tested, creating different subsets of features for classification, which affect the performance of the classifiers based on such compact profiles. In this sense, we propose and test a new covering array feature selection (CAFS) algorithm coupled to the naïve Bayes classifier (NBC) to discriminate among Amazonian cacao nibs from six cacao clones. The CAFS wrapper approach looks for the wavebands that maximize the F1-score, and then, are more relevant for classification. For this purpose, cacao pods of six varieties were collected, and their grains were extracted and processed (fermented, dried, roasted, and milled) to obtain cacao nibs. Then from each clone NIR spectral profiles in the range of 1100-2500 nm were extracted, and relevant wavebands were selected using the proposed CAFS algorithm. For comparison, two standard feature selection techniques were implemented the multi-cluster feature selection MCFS and the eigenvector centrality feature selection ECFS. Then, based on the different selected variables, three NBCs were built and compared among them through statistical metrics. The results showed that using the wavebands selected by CAFS, the NBC performed an average accuracy of 99.63%; being this superior to the 94.92% and 95.79% for ECFS and MCFS respectively. These results showed that the wavebands selected by the proposed CAFS algorithm allowed obtaining a better fit concerning other feature selection methods reported in the literature.

Keywords: Amazon cacao nibs; Chemometrics; Discrimination; NIR spectroscopy; Waveband selection.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Cacao*
  • Clone Cells
  • Spectroscopy, Near-Infrared