Accurate prediction of isothermal gas chromatographic Kováts retention indices

J Chromatogr A. 2023 Aug 30:1705:464176. doi: 10.1016/j.chroma.2023.464176. Epub 2023 Jun 24.

Abstract

We describe a freely available web server called Retention Index Predictor (RIpred) (https://ripred.ca) that rapidly and accurately predicts Gas Chromatographic Kováts Retention Indices (RI) using SMILES strings as chemical structure input. RIpred performs RI prediction for three different stationary phases (semi-standard non-polar (SSNP), standard non-polar (SNP), and standard polar (SP)) for both derivatized (trimethylsilyl (TMS) and tert‑butyldimethylsilyl (TBDMS) derivatized) and underivatized (base compound) forms of GC-amenable structures. RIpred was developed to address the need for freely available, fast, highly accurate RI predictions for a wide range of derivatized and underivatized chemicals for all common GC stationary phases. RIpred was trained using a Graph Neural Network (GNN) that used compound structures, their extracted features (mostly atom-level features) and the GC-RI data from the National Institute of Standards and Technology databases (NIST 17 and NIST 20). We curated this NIST 17 and NIST 20 GC-RI data, which is available for all three stationary phases, to create appropriate inputs (molecular graphs in this case) needed to enhance our model performance. The performance of different RIpred predictive models was evaluated using 10-fold cross validation (CV). The best performing RIpred models were identified and when tested on hold-out test sets from all stationary phases, achieved a Mean Absolute Error (MAE) of <73 RI units (SSNP: 16.5-29.5, SNP: 38.5-45.9, SP: 46.52-72.53). The Mean Absolute Percentage Error (MAPE) of these models were typically within 3% (SSNP: 0.78-1.62%, SNP: 1.87-2.88%, SP: 2.34-4.05%). When compared to the best performing model by Qu et al., 2021, RIpred performed similarly (MAE of 16.57 RI units [RIpred] vs. 16.84 RI units [Qu et al., 2021 predictor] for derivatized compounds). RIpred also includes ∼5 million predicted RI values for all GC-amenable compounds (∼57,000) in the Human Metabolome Database HMDB 5.0 (Wishart et al., 2022).

Keywords: Derivatization; Gas chromatography mass spectrometry; Graph neural network; Kováts retention index.

MeSH terms

  • Chromatography, Gas / methods
  • Databases, Factual
  • Humans
  • Metabolome*
  • Neural Networks, Computer*