The statistical technique of multiple regression, commonly referred to as "multivariable regression," is often used in clinical research to quantify the relationships between multiple predictor variables and a single outcome variable of interest. The foundational theory underpinning multivariable regression assumes that all predictor variables are independent of one another. In other words, the effect of each independent variable is measured by its contribution to the regression equation while all other variables remain unchanged. In the presence of correlations between two or more variables, however, it is impossible to change one variable without a consequent change in the variable(s) it is linked to. This condition, known as "multicollinearity," can introduce errors into multivariable regression models by affecting estimates of the regression coefficients that quantify the relationship between individual predictor variables and the outcome variable. Errors that arise due to violations of the multicollinearity assumption are of special interest to radiation oncology researchers. Because of high levels of correlation among variables derived from points along individual organ dose-volume histogram (DVH) curves, as well as strong intercorrelations among dose-volume parameters in neighboring organs, dosimetric analyses are particularly subject to multicollinearity errors. For example, dose-volume parameters for the heart are strongly correlated not only with other points along the heart DVH curve but are likely also correlated with dose-volume parameters in neighboring organs such as the lung. In this paper, we describe the problem of multicollinearity in accessible terms and discuss examples of violations of the multicollinearity assumption within the radiation oncology literature. Finally, we provide recommendations regarding best practices for identifying and managing multicollinearity in complex data sets.
Copyright © 2023 Elsevier Inc. All rights reserved.