Classifiability analysis of spectroscopic profiling datasets in food safety related discriminative tasks

Yinsheng Zhang; Xudong Yang; Zhengyong Zhang; Haiyan Wang

doi:10.1016/j.jfp.2024.100407

Classifiability analysis of spectroscopic profiling datasets in food safety related discriminative tasks

J Food Prot. 2024 Nov 13:100407. doi: 10.1016/j.jfp.2024.100407. Online ahead of print.

Authors

Yinsheng Zhang¹, Xudong Yang², Zhengyong Zhang³, Haiyan Wang⁴

Affiliations

¹ Zhejiang Food and Drug Quality & Safety Engineering Research Institute, Zhejiang Gongshang University, Hangzhou, 310018, China.
² School of Management and E-Business, Zhejiang Gongshang University, Hangzhou, 310018, China.
³ School of Management Science and Engineering, Nanjing University of Finance and Economics, Nanjing, 210023, China.
⁴ Zhejiang Food and Drug Quality & Safety Engineering Research Institute, Zhejiang Gongshang University, Hangzhou, 310018, China. Electronic address: [email protected].

PMID: 39547580
DOI: 10.1016/j.jfp.2024.100407

Abstract

Discriminative tasks, i.e., the identification of different food materials, brands, and origins, have become an essential part of food safety control. In recent years, spectroscopic profiling combined with machine learning is becoming popular for food-related discriminative tasks, but finding an appropriate classification model can be challenging. Compared to the current "trial-and-error" practice, this paper proposes a dedicated two-step classifiability analysis framework to address this issue. The first step collects more than 90 diversified metrics to measure the dataset separability from different perspectives. The second step synthesizes these metrics into a quantitative score using meta-learner and decomposition-based strategies. Finally, two Raman spectroscopic profiling case studies were conducted to validate the method, demonstrating higher scores for the easily separable liquor dataset (around 1.0) compared to the more challenging table salt dataset (< 0.5). This score can guide researchers to determine the required model complexity and assess the adequacy of the current physio-chemical profiling instrument. We expected the classifiability analysis framework proposed in this research to be generalized to a wide range of machine learning applications within the realm of food, where data-driven classification or discriminative tasks are involved.

Keywords: dataset separability; discriminative task; food safety; spectroscopic profiling.