Integrating Eye Tracking and Speech Recognition Accurately Annotates MR Brain Images for Deep Learning: Proof of Principle

Joseph N Stember; Haydar Celik; David Gutman; Nathaniel Swinburne; Robert Young; Sarah Eskreis-Winkler; Andrei Holodny; Sachin Jambawalikar; Bradford J Wood; Peter D Chang; Elizabeth Krupinski; Ulas Bagci

doi:10.1148/ryai.2020200047

Integrating Eye Tracking and Speech Recognition Accurately Annotates MR Brain Images for Deep Learning: Proof of Principle

Radiol Artif Intell. 2020 Nov 11;3(1):e200047. doi: 10.1148/ryai.2020200047. eCollection 2021 Jan.

Affiliation

¹ Department of Radiology, Memorial Sloan-Kettering Cancer Center, 1275 York Ave, New York, NY 10065 (J.N.S., D.G., N.S., R.Y., S.E.W., A.H.); The National Institutes of Health Clinical Center, Bethesda, Md (H.C., B.J.W.); Department of Radiology, Columbia University Medical Center, New York, NY (S.J.); Department of Radiology, University of California-Irvine, Irvine, Calif (P.D.C.); Department of Radiology & Imaging Sciences, Emory University, Atlanta, Ga (E.K.); and Center for Research in Computer Vision, University of Central Florida, Orlando, Fla (U.B.).

Abstract

Purpose: To generate and assess an algorithm combining eye tracking and speech recognition to extract brain lesion location labels automatically for deep learning (DL).

Materials and methods: In this retrospective study, 700 two-dimensional brain tumor MRI scans from the Brain Tumor Segmentation database were clinically interpreted. For each image, a single radiologist dictated a standard phrase describing the lesion into a microphone, simulating clinical interpretation. Eye-tracking data were recorded simultaneously. Using speech recognition, gaze points corresponding to each lesion were obtained. Lesion locations were used to train a keypoint detection convolutional neural network to find new lesions. A network was trained to localize lesions for an independent test set of 85 images. The statistical measure to evaluate our method was percent accuracy.

Results: Eye tracking with speech recognition was 92% accurate in labeling lesion locations from the training dataset, thereby demonstrating that fully simulated interpretation can yield reliable tumor location labels. These labels became those that were used to train the DL network. The detection network trained on these labels predicted lesion location of a separate testing set with 85% accuracy.

Conclusion: The DL network was able to locate brain tumors on the basis of training data that were labeled automatically from simulated clinical image interpretation.© RSNA, 2020.

2020 by the Radiological Society of North America, Inc.

Grants and funding

P30 CA008748/CA/NCI NIH HHS/United States