Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study

Sasank Chilamkurthy; Rohit Ghosh; Swetha Tanamala; Mustafa Biviji; Norbert G Campeau; Vasantha Kumar Venugopal; Vidur Mahajan; Pooja Rao; Prashant Warier

doi:10.1016/S0140-6736(18)31645-3

Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study

Lancet. 2018 Dec 1;392(10162):2388-2396. doi: 10.1016/S0140-6736(18)31645-3. Epub 2018 Oct 11.

Authors

Sasank Chilamkurthy¹, Rohit Ghosh², Swetha Tanamala², Mustafa Biviji³, Norbert G Campeau⁴, Vasantha Kumar Venugopal⁵, Vidur Mahajan⁵, Pooja Rao², Prashant Warier²

Affiliations

¹ Qure.ai, Goregaon East, Mumbai, India. Electronic address: [email protected].
² Qure.ai, Goregaon East, Mumbai, India.
³ CT & MRI Center, Dhantoli, Nagpur, India.
⁴ Department of Radiology, Mayo Clinic, Rochester, MN, USA.
⁵ Centre for Advanced Research in Imaging, Neurosciences and Genomics, New Delhi, India.

PMID: 30318264
DOI: 10.1016/S0140-6736(18)31645-3

Abstract

Background: Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms. We aimed to develop and validate a set of deep learning algorithms for automated detection of the following key findings from these scans: intracranial haemorrhage and its types (ie, intraparenchymal, intraventricular, subdural, extradural, and subarachnoid); calvarial fractures; midline shift; and mass effect.

Methods: We retrospectively collected a dataset containing 313 318 head CT scans together with their clinical reports from around 20 centres in India between Jan 1, 2011, and June 1, 2017. A randomly selected part of this dataset (Qure25k dataset) was used for validation and the rest was used to develop algorithms. An additional validation dataset (CQ500 dataset) was collected in two batches from centres that were different from those used for the development and Qure25k datasets. We excluded postoperative scans and scans of patients younger than 7 years. The original clinical radiology report and consensus of three independent radiologists were considered as gold standard for the Qure25k and CQ500 datasets, respectively. Areas under the receiver operating characteristic curves (AUCs) were primarily used to assess the algorithms.

Findings: The Qure25k dataset contained 21 095 scans (mean age 43 years; 9030 [43%] female patients), and the CQ500 dataset consisted of 214 scans in the first batch (mean age 43 years; 94 [44%] female patients) and 277 scans in the second batch (mean age 52 years; 84 [30%] female patients). On the Qure25k dataset, the algorithms achieved an AUC of 0·92 (95% CI 0·91-0·93) for detecting intracranial haemorrhage (0·90 [0·89-0·91] for intraparenchymal, 0·96 [0·94-0·97] for intraventricular, 0·92 [0·90-0·93] for subdural, 0·93 [0·91-0·95] for extradural, and 0·90 [0·89-0·92] for subarachnoid). On the CQ500 dataset, AUC was 0·94 (0·92-0·97) for intracranial haemorrhage (0·95 [0·93-0·98], 0·93 [0·87-1·00], 0·95 [0·91-0·99], 0·97 [0·91-1·00], and 0·96 [0·92-0·99], respectively). AUCs on the Qure25k dataset were 0·92 (0·91-0·94) for calvarial fractures, 0·93 (0·91-0·94) for midline shift, and 0·86 (0·85-0·87) for mass effect, while AUCs on the CQ500 dataset were 0·96 (0·92-1·00), 0·97 (0·94-1·00), and 0·92 (0·89-0·95), respectively.

Interpretation: Our results show that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention, opening up the possibility to use these algorithms to automate the triage process.

Funding: Qure.ai.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Algorithms*
Brain Injuries / diagnostic imaging*
Datasets as Topic
Deep Learning*
Head / diagnostic imaging
Humans
Intracranial Hemorrhages / diagnostic imaging*
Retrospective Studies
Skull Fractures / diagnostic imaging*
Tomography, X-Ray Computed*
Trauma Severity Indices
Triage / methods*