A Multivariable Prediction Model for Mild Cognitive Impairment and Dementia: Algorithm Development and Validation

JMIR Med Inform. 2024 Nov 22:12:e59396. doi: 10.2196/59396.

Abstract

Background: Mild cognitive impairment (MCI) poses significant challenges in early diagnosis and timely intervention. Underdiagnosis, coupled with the economic and social burden of dementia, necessitates more precise detection methods. Machine learning (ML) algorithms show promise in managing complex data for MCI and dementia prediction.

Objective: This study assessed the predictive accuracy of ML models in identifying the onset of MCI and dementia using the Korean Longitudinal Study of Aging (KLoSA) dataset.

Methods: This study used data from the KLoSA, a comprehensive biennial survey that tracks the demographic, health, and socioeconomic aspects of middle-aged and older Korean adults from 2018 to 2020. Among the 6171 initial households, 4975 eligible older adult participants aged 60 years or older were selected after excluding individuals based on age and missing data. The identification of MCI and dementia relied on self-reported diagnoses, with sociodemographic and health-related variables serving as key covariates. The dataset was categorized into training and test sets to predict MCI and dementia by using multiple models, including logistic regression, light gradient-boosting machine, XGBoost (extreme gradient boosting), CatBoost, random forest, gradient boosting, AdaBoost, support vector classifier, and k-nearest neighbors, and the training and test sets were used to evaluate predictive performance. The performance was assessed using the area under the receiver operating characteristic curve (AUC). Class imbalances were addressed via weights. Shapley additive explanation values were used to determine the contribution of each feature to the prediction rate.

Results: Among the 4975 participants, the best model for predicting MCI onset was random forest, with a median AUC of 0.6729 (IQR 0.3883-0.8152), followed by k-nearest neighbors with a median AUC of 0.5576 (IQR 0.4555-0.6761) and support vector classifier with a median AUC of 0.5067 (IQR 0.3755-0.6389). For dementia onset prediction, the best model was XGBoost, achieving a median AUC of 0.8185 (IQR 0.8085-0.8285), closely followed by light gradient-boosting machine with a median AUC of 0.8069 (IQR 0.7969-0.8169) and AdaBoost with a median AUC of 0.8007 (IQR 0.7907-0.8107). The Shapley values highlighted pain in everyday life, being widowed, living alone, exercising, and living with a partner as the strongest predictors of MCI. For dementia, the most predictive features were other contributing factors, education at the high school level, education at the middle school level, exercising, and monthly social engagement.

Conclusions: ML algorithms, especially XGBoost, exhibited the potential for predicting MCI onset using KLoSA data. However, no model has demonstrated robust accuracy in predicting MCI and dementia. Sociodemographic and health-related factors are crucial for initiating cognitive conditions, emphasizing the need for multifaceted predictive models for early identification and intervention. These findings underscore the potential and limitations of ML in predicting cognitive impairment in community-dwelling older adults.

Keywords: Alzheimer; MCI; aging; algorithm; cognitive; dementia; geriatrics; gerontology; machine learning; machine learning algorithms; mild cognitive impairment; older people; prediction; sociodemographic factors.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Cognitive Dysfunction* / diagnosis
  • Cognitive Dysfunction* / epidemiology
  • Dementia* / diagnosis
  • Dementia* / epidemiology
  • Female
  • Humans
  • Longitudinal Studies
  • Machine Learning
  • Male
  • Middle Aged
  • Republic of Korea / epidemiology