Purpose: To establish an explainable 18F-FDG PET/CT-derived prediction model to identify EGFR mutation status and subtypes (EGFR wild, EGFR-E19, and EGFR-E21) in lung adenocarcinoma (LUAD).
Methods: Baseline 18F-FDG PET/CT images of 478 patients with LUAD from 2 hospitals were collected. Data from hospital A (n = 390) was randomly split into a training group (n = 312) and an internal test group (n = 78), with data from hospital B (n = 88) utilized for external test. Further, a total of 4,760 handcrafted radiomics features (HRFs) were extracted from PET/CT scans. Candidates for the prediction model were constructed by cross-combinations of 11 feature selection methods and 7 classifiers. The optimal model was determined by combining the results of cross-center data validation and model visualization (Yellowbrick). The predictive performance was assessed via receiver operating characteristic curve, confusion matrix and classification report. Four explainable artificial intelligence technologies were used for optimal model interpretation.
Results: Sex and SUVmax were selected as clinical risk factors, which were then combined with 8 robust PET/CT HRFs to establish the models. The optimal performance was obtained by combining a light gradient boosting machine classifier with random forest feature selection method achieving an optimal performance with a macro-average AUC of 0.75 in the internal test group and 0.81 in the external test group.
Conclusion: The explainable EGFR mutation status prediction model have certain clinical practicability and good generalization performance, which may help in the timely selection of treatment options and prognosis prediction in patients with LUAD.
Keywords: Epidermal growth factor receptor; Explainable machine learning; Lung adenocarcinoma; PET/CT; Radiomics.
© 2024. The Author(s).