Evaluation of the intra- and inter-method agreement of brain MRI segmentation software packages: A comparison between SPM12 and FreeSurfer v6.0

L Palumbo; P Bosco; M E Fantacci; E Ferrari; P Oliva; G Spera; A Retico

doi:10.1016/j.ejmp.2019.07.016

Evaluation of the intra- and inter-method agreement of brain MRI segmentation software packages: A comparison between SPM12 and FreeSurfer v6.0

Phys Med. 2019 Aug:64:261-272. doi: 10.1016/j.ejmp.2019.07.016. Epub 2019 Aug 5.

Authors

L Palumbo¹, P Bosco², M E Fantacci³, E Ferrari⁴, P Oliva⁵, G Spera², A Retico²

Affiliations

¹ National Institute for Nuclear Physics (INFN), Pisa Division, Pisa, Italy. Electronic address: [email protected].
² National Institute for Nuclear Physics (INFN), Pisa Division, Pisa, Italy.
³ University of Pisa, Physics Department, Pisa, Italy.
⁴ National Institute for Nuclear Physics (INFN), Pisa Division, Pisa, Italy; Scuola Normale Superiore, Pisa, Italy.
⁵ University of Sassari and INFN Cagliari Division, Italy.

PMID: 31515029
DOI: 10.1016/j.ejmp.2019.07.016

Abstract

Purpose: The lack of inter-method agreement can produce inconsistent results in neuroimaging studies. We evaluated the intra-method repeatability and the inter-method reproducibility of two widely-used automatic segmentation methods for brain MRI: the FreeSurfer (FS) and the Statistical Parametric Mapping (SPM) software packages.

Methods: We segmented the gray matter (GM), the white matter (WM) and subcortical structures in test-retest MRI data of healthy volunteers from Kirby-21 and OASIS datasets. We used Pearson's correlation (r), Bland-Altman plot and Dice index to study intra-method repeatability and inter-method reproducibility. In order to test whether different processing methods affect the results of a neuroimaging-based group study, we carried out a statistical comparison between male and female volume measures.

Results: A high correlation was found between test-retest volume measures for both SPM (r in the 0.98-0.99 range) and FS (r in the 0.95-0.99 range). A non-null bias between test-retest FS volumes was detected for GM and WM in the OASIS dataset. The inter-method reproducibility analysis measured volume correlation values in the 0.72-0.98 range and the overlap between the segmented structures assessed by the Dice index was in the 0.76-0.83 range. SPM systematically provided significantly greater GM volumes and lower WM and subcortical volumes with respect to FS. In the male vs. female brain volume comparisons, inconsistencies arose for the OASIS dataset, where the gender-related differences appear subtler with respect to the Kirby dataset.

Conclusions: The inter-method reproducibility should be evaluated before interpreting the results of neuroimaging studies.

Keywords: Brain MRI; Repeatability; Reproducibility; Segmentation.

Publication types

Comparative Study

MeSH terms

Brain / diagnostic imaging*
Female
Humans
Image Processing, Computer-Assisted / methods*
Magnetic Resonance Imaging*
Male
Software*