Traceability and mislabelling of black tea for their geographical origin is known as a major fraud concern of the sector. Discrimination among various geographical indications (GIs) can be challenging due to the complexity of chemical fingerprints in multi-class metabolomics analysis. In this study, 302 black tea samples from 9 main cultivation GI regions were collected. A comprehensive non-targeted fingerprinting workflow was built on liquid chromatography quadrupole time-of-flight mass spectrometry (LC-QToF), and a comparison between conventional chemometrics modelling and machine learning was performed. 229 and 145 metabolites were selected as biomarkers and the model robustness/performance were further validated through internal 7-fold cross-validation and external validation, showing 100 % accuracy for discriminating GI origin on both. This research provided a novel solution to enhance transparency and traceability in the black tea supply chain for lab scenarios. Furthermore, the proposed biomarker selection workflow revealed more insights for future machine learning-derived non-targeted metabolomics research.
Keywords: Biomarker selection; Black tea; Geographical origin authentication; LC-QToF; Machine learning; Non-targeted metabolomics.
Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.