Radiomic approaches in precision medicine are promising, but variation associated with image acquisition factors can result in severe biases and low generalizability. Multicenter datasets used in these studies are often heterogeneous in multiple imaging parameters and/or have missing information, resulting in multimodal radiomic feature distributions. ComBat is a promising harmonization tool, but it only harmonizes by single/known variables and assumes standardized input data are normally distributed. We propose a procedure that sequentially harmonizes for multiple batch effects in an optimized order, called OPNested ComBat. Furthermore, we propose to address bimodality by employing a Gaussian Mixture Model (GMM) grouping considered as either a batch variable (OPNested + GMM) or as a protected clinical covariate (OPNested - GMM). Methods were evaluated on features extracted with CapTK and PyRadiomics from two public lung computed tomography (CT) datasets. We found that OPNested ComBat improved harmonization performance over standard ComBat. OPNested + GMM ComBat exhibited the best harmonization performance but the lowest predictive performance, while OPNested - GMM ComBat showed poorer harmonization performance, but the highest predictive performance. Our findings emphasize that improved harmonization performance is no guarantee of improved predictive performance, and that these methods show promise for superior standardization of datasets heterogeneous in multiple or unknown imaging parameters and greater generalizability.
© 2022. The Author(s).