Impact of 18F-FDG PET Intensity Normalization on Radiomic Features of Oropharyngeal Squamous Cell Carcinomas and Machine Learning-Generated Biomarkers

J Nucl Med. 2024 May 1;65(5):803-809. doi: 10.2967/jnumed.123.266637.

Abstract

We aimed to investigate the effects of 18F-FDG PET voxel intensity normalization on radiomic features of oropharyngeal squamous cell carcinoma (OPSCC) and machine learning-generated radiomic biomarkers. Methods: We extracted 1,037 18F-FDG PET radiomic features quantifying the shape, intensity, and texture of 430 OPSCC primary tumors. The reproducibility of individual features across 3 intensity-normalized images (body-weight SUV, reference tissue activity ratio to lentiform nucleus of brain and cerebellum) and the raw PET data was assessed using an intraclass correlation coefficient (ICC). We investigated the effects of intensity normalization on the features' utility in predicting the human papillomavirus (HPV) status of OPSCCs in univariate logistic regression, receiver-operating-characteristic analysis, and extreme-gradient-boosting (XGBoost) machine-learning classifiers. Results: Of 1,037 features, a high (ICC ≥ 0.90), medium (0.90 > ICC ≥ 0.75), and low (ICC < 0.75) degree of reproducibility across normalization methods was attained in 356 (34.3%), 608 (58.6%), and 73 (7%) features, respectively. In univariate analysis, features from the PET normalized to the lentiform nucleus had the strongest association with HPV status, with 865 of 1,037 (83.4%) significant features after multiple testing corrections and a median area under the receiver-operating-characteristic curve (AUC) of 0.65 (interquartile range, 0.62-0.68). Similar tendencies were observed in XGBoost models, with the lentiform nucleus-normalized model achieving the numerically highest average AUC of 0.72 (SD, 0.07) in the cross validation within the training cohort. The model generalized well to the validation cohorts, attaining an AUC of 0.73 (95% CI, 0.60-0.85) in independent validation and 0.76 (95% CI, 0.58-0.95) in external validation. The AUCs of the XGBoost models were not significantly different. Conclusion: Only one third of the features demonstrated a high degree of reproducibility across intensity-normalization techniques, making uniform normalization a prerequisite for interindividual comparability of radiomic markers. The choice of normalization technique may affect the radiomic features' predictive value with respect to HPV. Our results show trends that normalization to the lentiform nucleus may improve model performance, although more evidence is needed to draw a firm conclusion.

Keywords: PET; SUV; machine learning; normalization; radiomics.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Biomarkers, Tumor / metabolism
  • Carcinoma, Squamous Cell / diagnostic imaging
  • Female
  • Fluorodeoxyglucose F18*
  • Humans
  • Image Processing, Computer-Assisted / methods
  • Machine Learning*
  • Male
  • Middle Aged
  • Oropharyngeal Neoplasms* / diagnostic imaging
  • Positron-Emission Tomography / methods
  • Radiomics
  • Reproducibility of Results

Substances

  • Fluorodeoxyglucose F18
  • Biomarkers, Tumor