To create a diagnostic tool before biopsy for patients with prostate-specific antigen (PSA) levels < 20 ng/ml to minimize prostate biopsy-related discomfort and risks. Data from 655 patients who underwent transperineal prostate biopsy at the First Affiliated Hospital of Wannan Medical College from July 2021 to January 2023 were collected and analyzed. After applying the Synthetic Minority Over-sampling TEchnique class balancing on the training set, multiple machine learning models were constructed by using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection to identify the significant variables. The best-performing model was selected and evaluated through tenfold cross-validation to ensure interpretability. Finally, the performance was assessed using the test set data for validation. The age, prostate-specific antigen mass ratio (PSAMR), Prostate Imaging-Reporting and Data System, and prostate volume were selected as the variables for model construction based on the LASSO regression. The receiver operating characteristic (ROC) results for multiple models in the validation set were as follows: XGBoost: 0.93 (0.88-0.97); logistic: 0.89 (0.83-0.95); LightGBM: 0.87 (0.80-0.93); AdaBoost: 0.90 (0.85-0.96); GNB: 0.88 (0.82-0.95); CNB: 0.79 (0.71-0.87); MLP: 0.78 (0.69-0.86); and Support Vector Machine: 0.81 (0.73-0.89). XGBoost was selected as the best model and reconstructed with tenfold cross-validation on the training data, resulting in the following ROC scores: training set 0.995 (0.991-0.999), validation set 0.945 (0.885-0.997 ), and test set 0.920 (0.868-0.972). The Kolmogorov-Smirnov curve, calibration curve and learning curve yielded positive results; The decision curve demonstrates that patients with threshold probabilities ranging from 10 to 95% can benefit from this model. We developed an XGBoost machine learning model based on the PSAMR indicator and interpreted it using the SHapley Additive exPlanations method. The model offered a high-performance non-invasive technique to diagnose prostate cancer in patients with PSA levels < 20 ng/ml.
Keywords: PSAMR; Prostate cancer; SHAP; SMOTE; XGBoost machine learning model.
© 2025. The Author(s).