QSAR modeling based on the bias/variance compromise: a harmonious and parsimonious approach

J Comput Aided Mol Des. 2004 Jul-Sep;18(7-9):537-47. doi: 10.1007/s10822-004-4063-5.

Abstract

Modeling quantitative structure-activity relationships (QSAR) is considered with an emphasis on prediction. An abundance of methods are available to develop such models. Using a harmonious approach that balances the bias and variance of predictions, the best calibration models are identified relative to the bias and variance criteria used. Criteria utilized to determine the adequacy of models are the root mean square error of calibration (RMSEC) and validation (RMSEV), respective R2 values, and the norm of the regression vector. QSAR data from the literature are used to demonstrate concepts. For these data sets and criteria used, it is suggested that models obtained by ridge regression (RR) are more harmonious and parsimonious than models obtained by partial least squares (PLS) and principal component regression (PCR) when the data is mean-centered. The most harmonious RR models have the best bias/variance tradeoff, reflected by the smallest RMSEC, RMSEV, and regression vector norms and the largest calibration and validation R2 values. The most parsimonious RR models have the smallest effective rank.

MeSH terms

  • Least-Squares Analysis
  • Models, Molecular*
  • Quantitative Structure-Activity Relationship*