Determining risk and predictors of head and neck cancer treatment-related lymphedema: A clinicopathologic and dosimetric data mining approach using interpretable machine learning and ensemble feature selection

Clin Transl Radiat Oncol. 2024 Feb 28:46:100747. doi: 10.1016/j.ctro.2024.100747. eCollection 2024 May.

Abstract

Background and purpose: The ability to determine the risk and predictors of lymphedema is vital in improving the quality of life for head and neck (HN) cancer patients. However, selecting robust features is challenging due to the multicollinearity and high dimensionality of radiotherapy (RT) data. This study aims to overcome these challenges using an ensemble feature selection technique with machine learning (ML).

Materials and methods: Thirty organs-at-risk, including bilateral cervical lymph node levels, were contoured, and dose-volume data were extracted from 76 HN treatment plans. Clinicopathologic data was collected. Ensemble feature selection was used to reduce the number of features. Using the reduced features as input to ML and competing risk models, internal and external lymphedema prediction capability was evaluated with the ML models, and time to lymphedema event and risk stratification were estimated using the risk models.

Results: Two ML models, XGBoost and random forest, exhibited robust prediction performance. They achieved average F1-scores and AUCs of 84 ± 3.3 % and 79 ± 11.9 % (external lymphedema), and 64 ± 12 % and 78 ± 7.9 % (internal lymphedema). Predictive ML and risk models identified common predictors, including bulky node involvement, high dose to various lymph node levels, and lymph nodes removed during surgery. At 180 days, removing 0-25, 26-50, and > 50 lymph nodes increased external lymphedema risk to 72.1 %, 95.6 %, and 57.7 % respectively (p = 0.01).

Conclusion: Our approach, involving the reduction of HN RT data dimensionality, resulted in effective ML models for HN lymphedema prediction. Predictive dosimetric features emerged from both predictive and competing risk models. Consistency with clinicopathologic features from other studies supports our methodology.

Keywords: Assessments, risk; Early onset lymphedema; Explainable AI; Head and neck cancer; Interpretable AI; Lymphedema; Machine learning; Oropharyngeal cancer; Radiation dose response relationship.