Logistic regression is widely used to estimate relative risks (odds ratios) from case-control studies, but when the study exposure is continuous, standard parametric models may not accurately characterize the exposure-response curve. Semi-parametric generalized linear models provide a useful extension. In these models, the exposure of interest is modelled flexibly using a regression spline or a smoothing spline, while other variables are modelled using conventional methods. When coupled with a model-selection procedure based on minimizing a cross-validation score, this approach provides a non-parametric, objective, and reproducible method to characterize the exposure-response curve by one or several models with a favourable bias-variance trade-off. We applied this approach to case-control data to estimate the dose-response relationship between alcohol consumption and risk of oral cancer among African Americans. We did not find a uniquely 'best' model, but results using linear, cubic, and smoothing splines were consistent: there does not appear to be a risk-free threshold for alcohol consumption vis-à-vis the development of oral cancer. This finding was not apparent using a standard step-function model. In our analysis, the cross-validation curve had a global minimum and also a local minimum. In general, the phenomenon of multiple local minima makes it more difficult to interpret the results, and may present a computational roadblock to non-parametric generalized additive models of multiple continuous exposures. Nonetheless, the semi-parametric approach appears to be a practical advance.
Published in 2003 by John Wiley & Sons, Ltd.