Multicenter and multi-scanner imaging studies may be necessary to ensure sufficiently large sample sizes for developing accurate predictive models. However, multicenter studies, incorporating varying research participant characteristics, MRI scanners, and imaging acquisition protocols, may introduce confounding factors, potentially hindering the creation of generalizable machine learning models. Models developed using one dataset may not readily apply to another, emphasizing the importance of classification model generalizability in multi-scanner and multicenter studies for producing reproducible results. This study focuses on enhancing generalizability in classifying individual migraine patients and healthy controls using brain MRI data through a data harmonization strategy. We propose identifying a 'healthy core'-a group of homogeneous healthy controls with similar characteristics-from multicenter studies. The Maximum Mean Discrepancy (MMD) in Geodesic Flow Kernel (GFK) space is employed to compare two datasets, capturing data variabilities and facilitating the identification of this 'healthy core'. Homogeneous healthy controls play a vital role in mitigating unwanted heterogeneity, enabling the development of highly accurate classification models with improved performance on new datasets. Extensive experimental results underscore the benefits of leveraging a 'healthy core'. We utilized two datasets: one comprising 120 individuals (66 with migraine and 54 healthy controls), and another comprising 76 individuals (34 with migraine and 42 healthy controls). Notably, a homogeneous dataset derived from a cohort of healthy controls yielded a significant 25% accuracy improvement for both episodic and chronic migraineurs.
Copyright: © 2024 Yoon et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.