A generalizable methodology for predicting retention time of small molecule pharmaceutical compounds across reversed-phase HPLC columns

J Chromatogr A. 2025 Feb 8:1742:465628. doi: 10.1016/j.chroma.2024.465628. Epub 2024 Dec 30.

Abstract

Quantitative structure retention relation (QSRR) is an active field of research, primarily focused on predicting chromatography retention time (Rt) based on molecular structures of an input analyte on a single or limited number of reversed-phase HPLC (RP-HPLC) columns. However, in the pharmaceutical chemistry manufacturing and controls (CMC) settings, single-column QSRR models are often insufficient. It is important to translate retention time across different HPLC methods, specifically different stationary phases (SP) and mobile phases (MP), to guide the HPLC method development, and to bridge organic impurity profiles across different development phases and laboratories. In response to this need, we present a novel approach for retention time transfer across SPs and MPs, without requiring pre-existing Rt data on the target column. To achieve this, we developed an RP-HPLC based Genentech Multi-column Retention Time (GMCRT) database containing 51 small molecule pharmaceutical compounds analyzed on twenty SPs and multiple pH levels. The database incorporated the SP selectivity parameters from Hydrophobic Subtraction Model (HSM) - hydrophobicity (H), steric hindrance (S), hydrogen-bond acidity (A), hydrogen-bond basicity (B), ionic interaction (C) under two different pHs (2.8 and 7) and ethylbenzene (EB) retention factor. Two machine learning approaches, partial least squares (PLS) and artificial neural networks (ANN) were found to improve accuracy of Rt prediction on new SPs compared to the direct mapping approach that have been previously published, especially when the RP-HPLC columns have significant selectivity difference. As a comparison, our approach does not require pre-existing retention data on the target SPs and it is generalizable to any RP-HPLC columns with a set of known column selectivity parameters (https://www.hplccolumns.org/). The generalizability is achievable not only via the available retention data correlation among the twenty commonly-used RP-HPLC columns in GMCRT, but also via the retrainable mechanism of our ML models by adding Rt of the compounds of interest on the source columns into GMCRT, followed by predicting Rt on the target column. Thus, we propose a new QSRR framework that incorporates the physiochemical properties of SPs and MPs and makes the retention time prediction transferable across SPs and MPs. Such a framework is expected to open up possibilities for developing more comprehensive and generalizable models, and streamline RP-HPLC method development and lifecycle management across various pharmaceutical CMC development phases.

Keywords: Artificial neural network (ANN); Hydrophobic subtraction model (HSM); Machine learning (ML); Partial least squares (PLS); Quantitative structure retention relation (QSRR); Retention time modeling.

MeSH terms

  • Chromatography, High Pressure Liquid / methods
  • Chromatography, Reverse-Phase* / methods
  • Hydrogen Bonding
  • Hydrogen-Ion Concentration
  • Hydrophobic and Hydrophilic Interactions*
  • Pharmaceutical Preparations / analysis
  • Pharmaceutical Preparations / chemistry

Substances

  • Pharmaceutical Preparations