The impact of segmentation on whole-lung functional MRI quantification: Repeatability and reproducibility from multiple human observers and an artificial neural network

Corin Willers; Grzegorz Bauman; Simon Andermatt; Francesco Santini; Robin Sandkühler; Kathryn A Ramsey; Philippe C Cattin; Oliver Bieri; Orso Pusterla; Philipp Latzin

doi:10.1002/mrm.28476

The impact of segmentation on whole-lung functional MRI quantification: Repeatability and reproducibility from multiple human observers and an artificial neural network

Magn Reson Med. 2021 Feb;85(2):1079-1092. doi: 10.1002/mrm.28476. Epub 2020 Sep 6.

Authors

Corin Willers¹, Grzegorz Bauman^{2

3}, Simon Andermatt³, Francesco Santini^{2

3}, Robin Sandkühler³, Kathryn A Ramsey¹, Philippe C Cattin³, Oliver Bieri^{2

3}, Orso Pusterla^{2

3

4}, Philipp Latzin¹

Affiliations

¹ Division of Pediatric Respiratory Medicine, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
² Division of Radiological Physics, Department of Radiology, University of Basel Hospital, Basel, Switzerland.
³ Department of Biomedical Engineering, University of Basel, Basel, Switzerland.
⁴ Institute for Biomedical Engineering, University and ETH Zurich, Zurich, Switzerland.

PMID: 32892445
DOI: 10.1002/mrm.28476

Abstract

Purpose: To investigate the repeatability and reproducibility of lung segmentation and their impact on the quantitative outcomes from functional pulmonary MRI. Additionally, to validate an artificial neural network (ANN) to accelerate whole-lung quantification.

Method: Ten healthy children and 25 children with cystic fibrosis underwent matrix pencil decomposition MRI (MP-MRI). Impaired relative fractional ventilation (R_FV ) and relative perfusion (R_Q ) from MP-MRI were compared using whole-lung segmentation performed by a physician at two time-points (A_t1 and A_t2 ), by an MRI technician (B), and by an ANN (C). Repeatability and reproducibility were assess with Dice similarity coefficient (DSC), paired t-test and Intraclass-correlation coefficient (ICC).

Results: The repeatability within an observer (A_t1 vs A_t2 ) resulted in a DSC of 0.94 ± 0.01 (mean ± SD) and an unsystematic difference of -0.01% for R_FV (P = .92) and +0.1% for R_Q (P = .21). The reproducibility between human observers (A_t1 vs B) resulted in a DSC of 0.88 ± 0.02, and a systematic absolute difference of -0.81% (P < .001) for R_FV and -0.38% (P = .037) for R_Q . The reproducibility between human and the ANN (A_t1 vs C) resulted in a DSC of 0.89 ± 0.03 and a systematic absolute difference of -0.36% for R_FV (P = .017) and -0.35% for R_Q (P = .002). The ICC was >0.98 for all variables and comparisons.

Conclusions: Despite high overall agreement, there were systematic differences in lung segmentation between observers. This needs to be considered for longitudinal studies and could be overcome by using an ANN, which performs as good as human observers and fully automatizes MP-MRI post-processing.

Keywords: automated segmentation; functional lung MRI; inter-reader reproducibility; neural networks; pediatrics.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Child
Cystic Fibrosis* / diagnostic imaging
Humans
Lung / diagnostic imaging
Magnetic Resonance Imaging*
Neural Networks, Computer
Reproducibility of Results