Erectile Dysfunction (ED) is a form of sexual dysfunction in males that imposes significant health and financial burdens globally. Despite its high prevalence, diagnosing ED remains challenging due to the limitations of current diagnostic methods and patients' reluctance to seek medical help. Currently, some studies have used machine learning techniques for developing ED prediction models, but the performance and interpretability of existing models need to be further improved. This study utilized data from the National Health and Nutrition Examination Survey (NHANES) for the years 2001 to 2004, adhering to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. After excluding male respondents who did not meet the study criteria, a total of 3,869 participants were included. Gradient boosting decision tree (GBDT) algorithms (XGBoost, CatBoost, LightGBM) were used to develop the ED prediction model. Data preprocessing, feature selection, model evaluation, and interpretability analysis were performed to ensure the reliability and effectiveness of the model. The model evaluation results revealed that the AUC values are XGBoost: 0.887 ± 0.016; LightGBM: 0.879 ± 0.016; CatBoost: 0.871 ± 0.019. The F1-Scores are XGBoost: 0.695 ± 0.023; LightGBM: 0.681 ± 0.025; CatBoost: 0.681 ± 0.025. The Recall values are XGBoost: 0.789 ± 0.026; LightGBM: 0.739 ± 0.030; CatBoost: 0.711 ± 0.030. These results confirmed that the XGBoost model is the best-performing ED prediction model in this study. Interpretability analysis results of the XGBoost model showed that age, obesity, cardiovascular risk factors, prostate-related diseases, and socioeconomic status are key features for predicting ED, playing a significant role in the ED mechanism. Therefore, we believe the ED prediction model trained in this study has strong predictive performance and high interpretability. This model can help to expand the diagnostic options for ED, improve the diagnosis rate of ED, and assist doctors in early intervention for patients with ED, ultimately improving patient prognosis.
Keywords: Erectile Dysfunction; Machine learning; National Health and Nutrition Examination Survey; Prediction model; XGBoost.
© 2024. The Author(s).