Development and validation of a web-based calculator for determining the risk of psychological distress based on machine learning algorithms: A cross-sectional study of 342 lung cancer patients

Xu Tian; Haoyang Li; Feili Li; María F Jiménez-Herrera; Yi Ren; Hongcai Shang

doi:10.1007/s00520-024-09127-5

Development and validation of a web-based calculator for determining the risk of psychological distress based on machine learning algorithms: A cross-sectional study of 342 lung cancer patients

Support Care Cancer. 2024 Dec 30;33(1):63. doi: 10.1007/s00520-024-09127-5.

Authors

Xu Tian^#¹, Haoyang Li^#², Feili Li³, María F Jiménez-Herrera⁴, Yi Ren⁵, Hongcai Shang⁶

Affiliations

¹ Division of Science & Technology and Foreign Affairs, Chongqing Traditional Chinese Medicine Hospital, Chongqing, 400020, China.
² School of Data Science, The Chinese University of Hong Kong, Shenzhen, 518172, China.
³ Department of Nursing, Chongqing Traditional Chinese Medicine Hospital, Chongqing, 400020, China.
⁴ Nursing Department, Universitat Rovira I Virgili, 43002, Tarragona, Spain.
⁵ Department of Classic Traditional Chinese Medicine, Chongqing Traditional Chinese Medicine Hospital, Jiangbei District, No. 6 of the 7Th Branch of Panxi Road, Chongqing, 400020, China. [email protected].
⁶ Dongfang Hospital, Beijing University of Chinese Medicine, Beijing, 101402, China. [email protected].

^# Contributed equally.

PMID: 39738685
DOI: 10.1007/s00520-024-09127-5

Abstract

Purpose: Early and accurate identification of the risk of psychological distress allows for timely intervention and improved prognosis. Current methods for predicting psychological distress among lung cancer patients using readily available data are limited. This study aimed to develop a robust machine learning (ML) model for determining the risk of psychological distress among lung cancer patients.

Methods: A cross-sectional study was designed to collect data from 342 lung cancer patients. Least Absolute Shrinkage and Selection Operator (LASSO) was used for feature selection. Model training and validation were conducted with bootstrap resampling method. Fivefold cross-validation evaluated and optimized the model with parameter tuning. Feature importance was assessed using SHapley additive exPlanations (SHAP) method.

Results: The model identified seven independent risk factors of psychological distress: residence (β = 0.141), diagnosis duration (β = 0.055), TNM stage (β = 0.098), pain severity (β = 0.067), perceived stigma (β = 0.052), illness perception (β = 0.100), and coping style (β = 0.097). Among the eight ML algorithms evaluated, the extreme gradient boosting (XGBoost) algorithm demonstrated the highest performance with AUROC values of 0.988, 0.945, and 0.922 for the training, validation, and test sets, respectively. The model's results were further explained using SHAP, which revealed the importance and contribution of each risk factor to the overall distress risk. A web-based tool was developed based on this model to facilitate clinical use.

Conclusion: The XGBoost classifier demonstrated exceptional performance, and clinical implementation of the web-based risk calculator can serve as an easy-to-use tool for health practitioners to formulate early prevention and intervention strategies.

Keywords: Factor model; Lung cancer; Machine learning; Model explainability; Psychological distress.

Publication types

Validation Study

MeSH terms

Adaptation, Psychological
Adult
Aged
Algorithms
Cross-Sectional Studies
Female
Humans
Internet*
Lung Neoplasms* / pathology
Lung Neoplasms* / psychology
Machine Learning*
Male
Middle Aged
Psychological Distress*
Risk Assessment / methods
Risk Factors
Stress, Psychological / epidemiology
Stress, Psychological / etiology

Grants and funding

2024MSXM100/Chongqing medical scientific research project (Joint project of Chongqing Health Commission and Science and Technology Bureau)