Analysis of volatile compounds and vintage discrimination of raw Pu-erh tea based on GC-IMS and GC-MS combined with data fusion

J Chromatogr A. 2025 Jan 14:1743:465683. doi: 10.1016/j.chroma.2025.465683. Online ahead of print.

Abstract

Storage duration significantly influences the aroma profile of raw Pu-erh tea. To comprehensively investigate the differences in the volatile compounds across various vintages of raw Pu-erh teas and achieve the rapid classification of tea vintages, volatile compounds of raw Pu-erh tea with different years (2020-2023) were analyzed using a combination of gas chromatography-ion mobility spectrometry (GC-IMS) and gas chromatography-mass spectrometry (GC-MS). The datasets obtained from both techniques were integrated through low-level and mid-level data fusion strategies. Additionally, partial least squares discriminant analysis (PLS-DA) and random forest (RF) machine learning algorithms were applied to develop predictive models for the classification of tea storage durations. Consequently, GC-IMS and GC-MS identified 54 and 76 volatile compounds, respectively. Notably, the RF model, particularly when coupled with mid-level data fusion, exhibited exceptional predictive accuracy for tea storage time, reaching an accuracy of 100%. These findings provide a reference for elucidating the aroma characteristics of raw Pu-erh tea of different vintages and demonstrate that data fusion combined with machine learning has great potential for ensuring food quality.

Keywords: Data fusion; GC-IMS; GC-MS; Machine learning; Raw Pu-erh tea; Storage time; Volatile compounds.