Vision Transformer-Based Multilabel Survival Prediction for Oropharynx Cancer After Radiation Therapy

Int J Radiat Oncol Biol Phys. 2024 Mar 15;118(4):1123-1134. doi: 10.1016/j.ijrobp.2023.10.022. Epub 2023 Nov 7.

Abstract

Purpose: A reliable and comprehensive cancer prognosis model for oropharyngeal cancer (OPC) could better assist in personalizing treatment. In this work, we developed a vision transformer-based (ViT-based) multilabel model with multimodal input to learn complementary information from available pretreatment data and predict multiple associated endpoints for radiation therapy for patients with OPC.

Methods and materials: A publicly available data set of 512 patients with OPC was used for both model training and evaluation. Planning computed tomography images, primary gross tumor volume masks, and 16 clinical variables representing patient demographics, diagnosis, and treatment were used as inputs. To extract deep image features with global attention, we used a ViT module. Clinical variables were concatenated with the learned image features and fed into fully connected layers to incorporate cross-modality features. To learn the mapping between the features and correlated survival outcomes, including overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, we employed 4 multitask logistic regression layers. The proposed model was optimized by combining the multitask logistic regression negative-log likelihood losses of different prediction targets.

Results: We employed the C-index and area under the curve metrics to assess the performance of our model for time-to-event prediction and time-specific binary prediction, respectively. Our proposed model outperformed corresponding single-modality and single-label models on all prediction labels, achieving C-indices of 0.773, 0.765, 0.776, and 0.773 for overall survival, local failure-free survival, regional failure-free survival, and distant failure-free survival, respectively. The area under the curve values ranged between 0.799 and 0.844 for different tasks at different time points. Using the medians of predicted risks as the thresholds to identify high-risk and low-risk patient groups, we performed the log-rank test, the results of which showed significantly larger separations in different event-free survivals.

Conclusion: We developed the first model capable of predicting multiple labels for OPC simultaneously. Our model demonstrated better prognostic ability for all the prediction targets compared with corresponding single-modality models and single-label models.

MeSH terms

  • Humans
  • Oropharyngeal Neoplasms* / diagnostic imaging
  • Oropharyngeal Neoplasms* / pathology
  • Oropharyngeal Neoplasms* / radiotherapy
  • Prognosis
  • Progression-Free Survival
  • Risk Factors
  • Tomography, X-Ray Computed