Ginseng, which contains abundant ginsenosides, grows mainly in the Jilin, Liaoning, and Heilongjiang in China. It has been reported that the quality and traits of ginsengs from different origins were greatly different. To date, the accurate prediction of the origins of ginseng samples is still a challenge. Here, we integrated ultra-high-performance liquid chromatography quadrupole time-of-flight mass spectrometry (UHPLC-Q-TOF-MS) with a support vector machine (SVM) for rapid discrimination and prediction of ginseng from the three main regions where it is cultivated in China. Firstly, we develop a stable and reliable UHPLC-Q-TOF-MS method to obtain robust information for 31 batches of ginseng samples after reasonable optimization. Subsequently, a rapid pre-processing method was established for the rapid screening and identification of 69 characteristic ginsenosides in 31 batches ginseng samples from three different origins. The SVM model successfully distinguished ginseng origin, and the accuracy of SVM model was improved from 83% to 100% by optimizing the normalization method. Six crucial quality markers for different origins of ginseng were screened using a permutation importance algorithm in the SVM model. In addition, in order to validate the method, eight batches of test samples were used to predict the regions of cultivation of ginseng using the SVM model based on the six selected quality markers. As a result, the proposed strategy was suitable for the discrimination and prediction of the origin of ginseng samples.
Keywords: UHPLC-Q-TOF-MS; discrimination; ginseng; support vector machine.