Urinary volatile organic compounds (VOCs) based prostate cancer diagnosis via high-dimensional classification

J Appl Stat. 2024 Apr 26;51(16):3468-3485. doi: 10.1080/02664763.2024.2346355. eCollection 2024.

Abstract

Early detection of prostate cancer is critical for successful treatment and survival. However, current diagnostic methods such as prostate-specific antigen (PSA) testing and digital rectal examination (DRE) have limitations in accuracy, specificity, and sensitivity. Recent research suggests that urinary volatile organic compounds (VOCs) could serve as potential biomarkers for prostate cancer diagnosis. In this study, urine samples from 337 PCa-positive and 233 PCa-negative patients were collected to develop a diagnosis model. The study involves a high dimensional (HD) classification problem due to the vast number of measured VOCs. Our findings reveal that regularized logistic regression outperforms numerous other classifiers when analyzing the collected data. In particular, we have selected a regularized logistic model with the SCAD (smoothly clipped absolute deviation) penalty as the final model, which attains an AUC (area under the ROC curve) of 0.748, in contrast to a PSA-based AUC of 0.540. These results underscore the potential of VOC-based diagnosis as a clinically feasible approach for PCa screening.

Keywords: Classification; PLS-DA; high dimensional (HD) modeling; prostate cancer screening and diagnosis; regularized logistic regression; volatile organic compounds (VOC).

Grants and funding

Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health (NCI-NIH) under Award Number 1T32GM144919, SC1CA245675 and U54MD007592.