Purpose: The lack of inter-method agreement can produce inconsistent results in neuroimaging studies. We evaluated the intra-method repeatability and the inter-method reproducibility of two widely-used automatic segmentation methods for brain MRI: the FreeSurfer (FS) and the Statistical Parametric Mapping (SPM) software packages.
Methods: We segmented the gray matter (GM), the white matter (WM) and subcortical structures in test-retest MRI data of healthy volunteers from Kirby-21 and OASIS datasets. We used Pearson's correlation (r), Bland-Altman plot and Dice index to study intra-method repeatability and inter-method reproducibility. In order to test whether different processing methods affect the results of a neuroimaging-based group study, we carried out a statistical comparison between male and female volume measures.
Results: A high correlation was found between test-retest volume measures for both SPM (r in the 0.98-0.99 range) and FS (r in the 0.95-0.99 range). A non-null bias between test-retest FS volumes was detected for GM and WM in the OASIS dataset. The inter-method reproducibility analysis measured volume correlation values in the 0.72-0.98 range and the overlap between the segmented structures assessed by the Dice index was in the 0.76-0.83 range. SPM systematically provided significantly greater GM volumes and lower WM and subcortical volumes with respect to FS. In the male vs. female brain volume comparisons, inconsistencies arose for the OASIS dataset, where the gender-related differences appear subtler with respect to the Kirby dataset.
Conclusions: The inter-method reproducibility should be evaluated before interpreting the results of neuroimaging studies.
Keywords: Brain MRI; Repeatability; Reproducibility; Segmentation.
Copyright © 2019 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.