A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Himanshu S Sahoo; Greg M Silverman; Nicholas E Ingraham; Monica I Lupei; Michael A Puskarich; Raymond L Finzel; John Sartori; Rui Zhang; Benjamin C Knoll; Sijia Liu; Hongfang Liu; Genevieve B Melton; Christopher J Tignanelli; Serguei V S Pakhomov

doi:10.1093/jamiaopen/ooab070

A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

JAMIA Open. 2021 Aug 7;4(3):ooab070. doi: 10.1093/jamiaopen/ooab070. eCollection 2021 Jul.

Affiliations

¹ Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, Minnesota, USA.
² Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA.
³ Pulmonary Disease and Critical Care Medicine, University of Minnesota, Minneapolis, Minnesota, USA.
⁴ Department of Anesthesiology, University of Minnesota, Minneapolis, Minnesota, USA.
⁵ Department of Emergency Medicine, University of Minnesota, Minneapolis, Minnesota, USA.
⁶ Department of Pharmaceutical Care and Health Systems, University of Minnesota, Minneapolis, Minnesota, USA.
⁷ Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.
⁸ Department of Health Science Research, Mayo Clinic, Rochester, New York, USA.

Abstract

Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution.

Materials and methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger.

Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems.

Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime.

Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.

Keywords: and symptoms; artificial intelligence; clinical decision support systems; follow-up studies; information extraction; natural language processing; signs.

A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification

Authors

Affiliations

Abstract

Grants and funding