Current radiomic studies of head and neck squamous cell carcinomas (HNSCC) are typically based on datasets combining tumors from different locations, assuming that the radiomic features are similar based on histopathologic characteristics. However, molecular pathogenesis and treatment in HNSCC substantially vary across different tumor sites. It is not known if a statistical difference exists between radiomic features from different tumor sites and how they affect machine learning model performance in endpoint prediction. To answer these questions, we extracted radiomic features from contrast-enhanced neck computed tomography scans (CTs) of 605 patients with HNSCC originating from the oral cavity, oropharynx, and hypopharynx/larynx. The difference in radiomic features of tumors from these sites was assessed using statistical analyses and Random Forest classifiers on the radiomic features with 10-fold cross-validation to predict tumor sites, nodal metastasis, and HPV status. We found statistically significant differences (p-value ≤ 0.05) between the radiomic features of HNSCC depending on tumor location. We also observed that differences in quantitative features among HNSCC from different locations impact the performance of machine learning models. This suggests that radiomic features may reveal biologic heterogeneity complementary to current gold standard histopathologic evaluation. We recommend considering tumor site in radiomic studies of HNSCC.
Keywords: classification; head and neck squamous cell carcinomas; human papilloma virus; machine learning; metastasis; radiomics.