Hippocampal volumetry derived from structural MRI is increasingly used to delineate regions of interest for functional measurements, assess efficacy in therapeutic trials of Alzheimer's disease (AD) and has been endorsed by the new AD diagnostic guidelines as a radiological marker of disease progression. Unfortunately, morphological heterogeneity in AD can prevent accurate demarcation of the hippocampus. Recent developments in automated volumetry commonly use multi-template fusion driven by expert manual labels, enabling highly accurate and reproducible segmentation in disease and healthy subjects. However, there are several protocols to define the hippocampus anatomically in vivo, and the method used to generate atlases may impact automatic accuracy and sensitivity - particularly in pathologically heterogeneous samples. Here we report a fully automated segmentation technique that provides a robust platform to directly evaluate both technical and biomarker performance in AD among anatomically unique labeling protocols. For the first time we test head-to-head the performance of five common hippocampal labeling protocols for multi-atlas based segmentation, using both the Sunnybrook Longitudinal Dementia Study and the entire Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) baseline and 24-month dataset. We based these atlas libraries on the protocols of (Haller et al., 1997; Killiany et al., 1993; Malykhin et al., 2007; Pantel et al., 2000; Pruessner et al., 2000), and a single operator performed all manual tracings to generate de facto "ground truth" labels. All methods distinguished between normal elders, mild cognitive impairment (MCI), and AD in the expected directions, and showed comparable correlations with measures of episodic memory performance. Only more inclusive protocols distinguished between stable MCI and MCI-to-AD converters, and had slightly better associations with episodic memory. Moreover, we demonstrate that protocols including more posterior anatomy and dorsal white matter compartments furnish the best voxel-overlap accuracies (Dice Similarity Coefficient=0.87-0.89), compared to expert manual tracings, and achieve the smallest sample sizes required to power clinical trials in MCI and AD. The greatest distribution of errors was localized to the caudal hippocampus and the alveus-fimbria compartment when these regions were excluded. The definition of the medial body did not significantly alter accuracy among more comprehensive protocols. Voxel-overlap accuracies between automatic and manual labels were lower for the more pathologically heterogeneous Sunnybrook study in comparison to the ADNI-1 sample. Finally, accuracy among protocols appears to significantly differ the most in AD subjects compared to MCI and normal elders. Together, these results suggest that selection of a candidate protocol for fully automatic multi-template based segmentation in AD can influence both segmentation accuracy when compared to expert manual labels and performance as a biomarker in MCI and AD.
Keywords: Alzheimer's disease; Automatic hippocampal segmentation; Hippocampal tracing protocol; Multi-atlas.
Copyright © 2012 Elsevier Inc. All rights reserved.