Kernel-Based, Partial Least Squares Quantitative Structure-Retention Relationship Model for UPLC Retention Time Prediction: A Useful Tool for Metabolite Identification

Anal Chem. 2016 Oct 4;88(19):9510-9517. doi: 10.1021/acs.analchem.6b02075. Epub 2016 Sep 14.

Abstract

We propose a new QSRR model based on a Kernel-based partial least-squares method for predicting UPLC retention times in reversed phase mode. The model was built using a combination of classical (physicochemical and topological) and nonclassical (fingerprints) molecular descriptors of 1383 compounds, encompassing different chemical classes and structures and their accurately measured retention time values. Following a random splitting of the data set into a training and a test set, we tested the ability of the model to predict the retention time of all the compounds. The best predicted/experimental R2 value was higher than 0.86, while the best Q2 value we observed was close to 0.84. A comparison of our model with traditional and simpler MLR and PLS regression models shows that KPLS better performs in term of correlation (R2), prediction (Q2), and support to MetID peak assignment. The KPLS model succeeded in two real-life MetID tasks by correctly predicting elution order of Phase I metabolites, including isomeric monohydroxylated compounds. We also show in this paper that the model's predictive power can be extended to different gradient profiles, by simple mathematical extrapolation using a known equation, thus offering very broad flexibility. Moreover, the current study includes a deep investigation of different types of chemical descriptors used to build the structure-retention relationship.

MeSH terms

  • Algorithms
  • Chromatography, Liquid*
  • Least-Squares Analysis
  • Models, Chemical*
  • Principal Component Analysis