Objectives: Electroencephalography (EEG) is a central part of the medical evaluation for patients with neurological disorders. Training an algorithm to label the EEG normal vs abnormal seems challenging, because of EEG heterogeneity and dependence of contextual factors, including age and sleep stage. Our objectives were to validate prior work on an independent data set suggesting that deep learning methods can discriminate between normal vs abnormal EEGs, to understand whether age and sleep stage information can improve discrimination, and to understand what factors lead to errors.
Methods: We train a deep convolutional neural network on a heterogeneous set of 8522 routine EEGs from the Massachusetts General Hospital. We explore several strategies for optimizing model performance, including accounting for age and sleep stage.
Results: The area under the receiver operating characteristic curve (AUC) on an independent test set (n = 851) is 0.917 marginally improved by including age (AUC = 0.924), and both age and sleep stages (AUC = 0.925), though not statistically significant.
Conclusions: The model architecture generalizes well to an independent dataset. Adding age and sleep stage to the model does not significantly improve performance.
Significance: Insights learned from misclassified examples, and minimal improvement by adding sleep stage and age suggest fruitful directions for further research.
Keywords: Clinical neurophysiology; Computer aided diagnosis (CAD); Convolutional neural networks (CNN); Deep learning; Electroencephalograms (EEG); Epilepsy.
Copyright © 2018 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.