Promoting LC-QToF based non-targeted fingerprinting and biomarker selection with machine learning for the discrimination of black tea geographical origin

Food Chem. 2025 Feb 15;465(Pt 2):142088. doi: 10.1016/j.foodchem.2024.142088. Epub 2024 Nov 23.

Abstract

Traceability and mislabelling of black tea for their geographical origin is known as a major fraud concern of the sector. Discrimination among various geographical indications (GIs) can be challenging due to the complexity of chemical fingerprints in multi-class metabolomics analysis. In this study, 302 black tea samples from 9 main cultivation GI regions were collected. A comprehensive non-targeted fingerprinting workflow was built on liquid chromatography quadrupole time-of-flight mass spectrometry (LC-QToF), and a comparison between conventional chemometrics modelling and machine learning was performed. 229 and 145 metabolites were selected as biomarkers and the model robustness/performance were further validated through internal 7-fold cross-validation and external validation, showing 100 % accuracy for discriminating GI origin on both. This research provided a novel solution to enhance transparency and traceability in the black tea supply chain for lab scenarios. Furthermore, the proposed biomarker selection workflow revealed more insights for future machine learning-derived non-targeted metabolomics research.

Keywords: Biomarker selection; Black tea; Geographical origin authentication; LC-QToF; Machine learning; Non-targeted metabolomics.

Publication types

  • Evaluation Study

MeSH terms

  • Biomarkers* / analysis
  • Camellia sinensis* / chemistry
  • Camellia sinensis* / classification
  • Chromatography, High Pressure Liquid
  • Chromatography, Liquid
  • Machine Learning*
  • Mass Spectrometry*
  • Metabolomics*
  • Tea* / chemistry

Substances

  • Biomarkers
  • Tea