Caffeine, quinic acid, and nicotinic acid are among the significant chemical determinants of coffee quality. This study develops a chemometric model to quantify these compounds in ternary mixtures analyzed by terahertz time-domain spectroscopy (THz-TDS). A data set of 480 THz spectra was obtained from 80 samples. Combinations of data preprocessing methods, including normalization (Z-score, min-max scaling, Mie baseline removal) and dimensionality reduction (principal component analysis (PCA), factor analysis (FA), independent component analysis (ICA), locally linear embedding (LLE), non-negative matrix factorization (NMF), isomap), and prediction models (partial least-squares regression (PLSR), support vector regression (SVR), multilayer perceptron (MLP), convolutional neural network (CNN), gradient boosting) were analyzed for their prediction performance (totaling to 4,711,685 combinations). Results show that the highest quantification performance was achieved at a root-mean-square error of prediction (RMSEP) of 0.0254 (dimensionless mass ratio), using min-max scaling and factor analysis for data preprocessing and multilayer perceptron for prediction. Effects of preprocessing, comparison of prediction models, and linearity of data are discussed.
© 2022 The Authors. Published by American Chemical Society.