A Multicenter, Scan-Rescan, Human and Machine Learning CMR Study to Test Generalizability and Precision in Imaging Biomarker Analysis

Circ Cardiovasc Imaging. 2019 Oct;12(10):e009214. doi: 10.1161/CIRCIMAGING.119.009214. Epub 2019 Sep 24.

Abstract

Background: Automated analysis of cardiac structure and function using machine learning (ML) has great potential, but is currently hindered by poor generalizability. Comparison is traditionally against clinicians as a reference, ignoring inherent human inter- and intraobserver error, and ensuring that ML cannot demonstrate superiority. Measuring precision (scan:rescan reproducibility) addresses this. We compared precision of ML and humans using a multicenter, multi-disease, scan:rescan cardiovascular magnetic resonance data set.

Methods: One hundred ten patients (5 disease categories, 5 institutions, 2 scanner manufacturers, and 2 field strengths) underwent scan:rescan cardiovascular magnetic resonance (96% within one week). After identification of the most precise human technique, left ventricular chamber volumes, mass, and ejection fraction were measured by an expert, a trained junior clinician, and a fully automated convolutional neural network trained on 599 independent multicenter disease cases. Scan:rescan coefficient of variation and 1000 bootstrapped 95% CIs were calculated and compared using mixed linear effects models.

Results: Clinicians can be confident in detecting a 9% change in left ventricular ejection fraction, with greater than half of coefficient of variation attributable to intraobserver variation. Expert, trained junior, and automated scan:rescan precision were similar (for left ventricular ejection fraction, coefficient of variation 6.1 [5.2%-7.1%], P=0.2581; 8.3 [5.6%-10.3%], P=0.3653; 8.8 [6.1%-11.1%], P=0.8620). Automated analysis was 186× faster than humans (0.07 versus 13 minutes).

Conclusions: Automated ML analysis is faster with similar precision to the most precise human techniques, even when challenged with real-world scan:rescan data. Assessment of multicenter, multi-vendor, multi-field strength scan:rescan data (available at www.thevolumesresource.com) permits a generalizable assessment of ML precision and may facilitate direct translation of ML to clinical practice.

Keywords: artificial intelligence; image processing; left ventricular remodeling; magnetic resonance imaging, cine; ventricular function.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers / analysis*
  • Cardiovascular Diseases / diagnostic imaging*
  • Cardiovascular Diseases / physiopathology*
  • Female
  • Humans
  • Image Interpretation, Computer-Assisted
  • Machine Learning*
  • Magnetic Resonance Imaging / methods*
  • Male
  • Middle Aged
  • Reproducibility of Results
  • Stroke Volume
  • Ventricular Dysfunction, Left / diagnostic imaging*
  • Ventricular Dysfunction, Left / physiopathology*

Substances

  • Biomarkers