Natural language processing to convert unstructured COVID-19 chest-CT reports into structured reports

Salvatore Claudio Fanni; Chiara Romei; Giovanni Ferrando; Federica Volpi; Caterina Aida D'Amore; Claudio Bedini; Sandro Ubbiali; Salvatore Valentino; Emanuele Neri

doi:10.1016/j.ejro.2023.100512

Natural language processing to convert unstructured COVID-19 chest-CT reports into structured reports

Eur J Radiol Open. 2023 Jul 25:11:100512. doi: 10.1016/j.ejro.2023.100512. eCollection 2023 Dec.

Authors

Salvatore Claudio Fanni¹, Chiara Romei², Giovanni Ferrando³, Federica Volpi¹, Caterina Aida D'Amore¹, Claudio Bedini⁴, Sandro Ubbiali³, Salvatore Valentino³, Emanuele Neri¹

Affiliations

¹ Department of Translational Research, Academic Radiology, University of Pisa, Pisa, Italy.
² Department of Diagnostic Imaging, 2nd Radiology Unit, Pisa University-Hospital, Pisa, Italy.
³ EBIT sr.l. Esaote Group, Via di Caciolle, Florence, Italy.
⁴ EBIT sr.l. Esaote Group, Via Melen 77, Genoa, Italy.

Abstract

Background: Structured reporting has been demonstrated to increase report completeness and to reduce error rate, also enabling data mining of radiological reports. Still, structured reporting is perceived by radiologists as a fragmented reporting style, limiting their freedom of expression.

Purpose: A deep learning-based natural language processing method was developed to automatically convert unstructured COVID-19 chest CT reports into structured reports.

Methods: Two hundred-two COVID-19 chest CT were retrospectively reviewed by two experienced radiologists, who wrote for each exam a free-form text radiological report and coherently filled the template provided by the Italian Society of Medical and Interventional Radiology, used as ground-truth. A semi-supervised convolutional neural network was implemented to extract 62 categorical variables from the report. Two iterations were carried-out, the first without fine-tuning, the second one performing a fine-tuning. The performance was measured using the mean accuracy and the F1 mean score. An error analysis was performed to identify errors entirely attributable to incorrect processing of the model.

Results: The algorithm achieved a mean accuracy of 93.7% and an F1 score 93.8% in the first iteration. Most of the errors were exclusively attributable to wrong inference (46%). In the second iteration the model achieved for both parameters 95,8% and percentage of errors attributable to wrong inference decreased to 26%.

Conclusions: The convolutional neural network achieved an optimal performance in the automated conversion of free-form text into structured radiological reports, overcoming all the limitation attributed to structured reporting and finally paving the way for data mining of radiological report.

Keywords: Artificial intelligence; COVID-19; Deep learning; Natural language processing; Structured reporting.