Background: Neoadjuvant chemotherapy is standard for advanced esophageal squamous cell carcinoma, though often ineffective. Therefore, predicting the response to chemotherapy before treatment is desirable. However, there is currently no established method for predicting response to neoadjuvant chemotherapy. This study aims to build a deep-learning model to predict the response of esophageal squamous cell carcinoma to preoperative chemotherapy by utilizing multimodal data integrating esophageal endoscopic images and clinical information.
Methods: 170 patients with locally advanced esophageal squamous cell carcinoma were retrospectively studied, and endoscopic images and clinical information before neoadjuvant chemotherapy were collected. Endoscopic images alone and endoscopic images plus clinical information were each analyzed with a deep-learning model based on ResNet50. The clinical information alone was analyzed using logistic regression machine learning models, and the area under a receiver operating characteristic curve was calculated to compare the accuracy of each model. Gradient-weighted Class Activation Mapping was used on the endoscopic images to analyze the trend of the regions of interest in this model.
Results: The area under the curve by clinical information alone, endoscopy alone, and both combined were 0.64, 0.55, and 0.77, respectively. The endoscopic image plus clinical information group was statistically more significant than the other models. This model focused more on the tumor when trained with clinical information.
Conclusions: The deep-learning model developed suggests that gastrointestinal endoscopic imaging, in combination with other clinical information, has the potential to predict the efficacy of neoadjuvant chemotherapy in locally advanced esophageal squamous cell carcinoma before treatment.
Keywords: Esophageal squamous cell carcinoma; Multimodal machine learning; Neoadjuvant chemotherapy; Prediction.
© 2025. The Author(s) under exclusive licence to The Japan Esophageal Society.