Background: Robust prediction of survival can facilitate clinical decision-making and patient counselling. Non-Caucasian males are underrepresented in most prostate cancer databases. We evaluated the variation in performance of a machine learning (ML) algorithm trained to predict survival after radical prostatectomy in race subgroups.
Methods: We used the National Cancer Database (NCDB) to identify patients undergoing radical prostatectomy between 2004 and 2016. We grouped patients by race into Caucasian, African-American, or non-Caucasian, non-African-American (NCNAA) subgroups. We trained an Extreme Gradient Boosting (XGBoost) classifier to predict 5-year survival in different training samples: naturally race-imbalanced, race-specific, and synthetically race-balanced. We evaluated performance in the test sets.
Results: A total of 68,630 patients met inclusion criteria. Of these, 57,635 (84%) were Caucasian, 8173 (12%) were African-American, and 2822 (4%) were NCNAA. For the classifier trained in the naturally race-imbalanced sample, the F1 scores were 0.514 (95% confidence interval: 0.513-0.511), 0.511 (0.511-0.512), 0.545 (0.541-0.548), and 0.378 (0.378-0.389) in the race-imbalanced, Caucasian, African-American, and NCNAA test samples, respectively. For all race subgroups, the F1 scores of classifiers trained in the race-specific or synthetically race-balanced samples demonstrated similar performance compared to training in the naturally race-imbalanced sample.
Conclusions: A ML algorithm trained using NCDB data to predict survival after radical prostatectomy demonstrates variation in performance by race, regardless of whether the algorithm is trained in a naturally race-imbalanced, race-specific, or synthetically race-balanced sample. These results emphasize the importance of thoroughly evaluating ML algorithms in race subgroups before clinical deployment to avoid potential disparities in care.
Keywords: machine learning; prostatectomy; prostatic neoplasms; race; survival.
© 2021 Wiley Periodicals LLC.