Impact of feature harmonization on radiogenomics analysis: Prediction of EGFR and KRAS mutations from non-small cell lung cancer PET/CT images

Comput Biol Med. 2022 Mar:142:105230. doi: 10.1016/j.compbiomed.2022.105230. Epub 2022 Jan 11.

Abstract

Objective: To investigate the impact of harmonization on the performance of CT, PET, and fused PET/CT radiomic features toward the prediction of mutations status, for epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene (KRAS) genes in non-small cell lung cancer (NSCLC) patients.

Methods: Radiomic features were extracted from tumors delineated on CT, PET, and wavelet fused PET/CT images obtained from 136 histologically proven NSCLC patients. Univariate and multivariate predictive models were developed using radiomic features before and after ComBat harmonization to predict EGFR and KRAS mutation statuses. Multivariate models were built using minimum redundancy maximum relevance feature selection and random forest classifier. We utilized 70/30% splitting patient datasets for training/testing, respectively, and repeated the procedure 10 times. The area under the receiver operator characteristic curve (AUC), accuracy, sensitivity, and specificity were used to assess model performance. The performance of the models (univariate and multivariate), before and after ComBat harmonization was compared using statistical analyses.

Results: While the performance of most features in univariate modeling was significantly improved for EGFR prediction, most features did not show any significant difference in performance after harmonization in KRAS prediction. Average AUCs of all multivariate predictive models for both EGFR and KRAS were significantly improved (q-value < 0.05) following ComBat harmonization. The mean ranges of AUCs increased following harmonization from 0.87-0.90 to 0.92-0.94 for EGFR, and from 0.85-0.90 to 0.91-0.94 for KRAS. The highest performance was achieved by harmonized F_R0.66_W0.75 model with AUC of 0.94, and 0.93 for EGFR and KRAS, respectively.

Conclusion: Our results demonstrated that regarding univariate modelling, while ComBat harmonization had generally a better impact on features for EGFR compared to KRAS status prediction, its effect is feature-dependent. Hence, no systematic effect was observed. Regarding the multivariate models, ComBat harmonization significantly improved the performance of all radiomics models toward more successful prediction of EGFR and KRAS mutation statuses in lung cancer patients. Thus, by eliminating the batch effect in multi-centric radiomic feature sets, harmonization is a promising tool for developing robust and reproducible radiomics using vast and variant datasets.

Keywords: Artificial intelligence; Computed tomography; Harmonization; Imaging genomics; Non-small cell lung cancer; Positron emission tomography.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Carcinoma, Non-Small-Cell Lung* / diagnostic imaging
  • Carcinoma, Non-Small-Cell Lung* / genetics
  • ErbB Receptors / genetics
  • Humans
  • Lung Neoplasms* / diagnostic imaging
  • Lung Neoplasms* / genetics
  • Lung Neoplasms* / pathology
  • Mutation / genetics
  • Positron Emission Tomography Computed Tomography / methods
  • Proto-Oncogene Proteins p21(ras) / genetics

Substances

  • KRAS protein, human
  • EGFR protein, human
  • ErbB Receptors
  • Proto-Oncogene Proteins p21(ras)