Background: Early prediction of dementia risk is crucial for effective interventions. Given the known etiologic heterogeneity, machine learning methods leveraging multimodal data, such as clinical manifestations, neuroimaging biomarkers, and well-documented risk factors, could predict dementia more accurately than single modal data.
Objective: This study aims to develop machine learning models that capitalize on neuropsychological (NP) tests, magnetic resonance imaging (MRI) measures, and clinical risk factors for 10-year dementia prediction.
Methods: This study included participants from the Framingham Heart Study, and various data modalities such as NP tests, MRI measures, and demographic variables were collected. CatBoost was used with Optuna hyperparameter optimization to create prediction models for 10-year dementia risk using different combinations of data modalities. The contribution of each modality and feature for the prediction task was also quantified using Shapley values.
Results: This study included 1,031 participants with normal cognitive status at baseline (age 75±5 years, 55.3% women), of whom 205 were diagnosed with dementia during the 10-year follow-up. The model built on three modalities demonstrated the best dementia prediction performance (AUC 0.90±0.01) compared to single modality models (AUC range: 0.82-0.84). MRI measures contributed most to dementia prediction (mean absolute Shapley value: 3.19), suggesting the necessity of multimodal inputs.
Conclusion: This study shows that a multimodal machine learning framework had a superior performance for 10-year dementia risk prediction. The model can be used to increase vigilance for cognitive deterioration and select high-risk individuals for early intervention and risk management.
Keywords: Alzheimer’s disease; dementia risk prediction; machine learning; magnetic resonance imaging; multimodal data; neuropsychological test.