Development and validation of a web-based calculator for determining the risk of psychological distress based on machine learning algorithms: A cross-sectional study of 342 lung cancer patients

Support Care Cancer. 2024 Dec 30;33(1):63. doi: 10.1007/s00520-024-09127-5.

Abstract

Purpose: Early and accurate identification of the risk of psychological distress allows for timely intervention and improved prognosis. Current methods for predicting psychological distress among lung cancer patients using readily available data are limited. This study aimed to develop a robust machine learning (ML) model for determining the risk of psychological distress among lung cancer patients.

Methods: A cross-sectional study was designed to collect data from 342 lung cancer patients. Least Absolute Shrinkage and Selection Operator (LASSO) was used for feature selection. Model training and validation were conducted with bootstrap resampling method. Fivefold cross-validation evaluated and optimized the model with parameter tuning. Feature importance was assessed using SHapley additive exPlanations (SHAP) method.

Results: The model identified seven independent risk factors of psychological distress: residence (β = 0.141), diagnosis duration (β = 0.055), TNM stage (β = 0.098), pain severity (β = 0.067), perceived stigma (β = 0.052), illness perception (β = 0.100), and coping style (β = 0.097). Among the eight ML algorithms evaluated, the extreme gradient boosting (XGBoost) algorithm demonstrated the highest performance with AUROC values of 0.988, 0.945, and 0.922 for the training, validation, and test sets, respectively. The model's results were further explained using SHAP, which revealed the importance and contribution of each risk factor to the overall distress risk. A web-based tool was developed based on this model to facilitate clinical use.

Conclusion: The XGBoost classifier demonstrated exceptional performance, and clinical implementation of the web-based risk calculator can serve as an easy-to-use tool for health practitioners to formulate early prevention and intervention strategies.

Keywords: Factor model; Lung cancer; Machine learning; Model explainability; Psychological distress.

Publication types

  • Validation Study

MeSH terms

  • Adaptation, Psychological
  • Adult
  • Aged
  • Algorithms
  • Cross-Sectional Studies
  • Female
  • Humans
  • Internet*
  • Lung Neoplasms* / pathology
  • Lung Neoplasms* / psychology
  • Machine Learning*
  • Male
  • Middle Aged
  • Psychological Distress*
  • Risk Assessment / methods
  • Risk Factors
  • Stress, Psychological / epidemiology
  • Stress, Psychological / etiology