Approaches for extracting daily dosage from free-text prescription signatures in heart failure with reduced ejection fraction: a comparative study

Theodorus S Haaker; Joshua S Choi; Claude J Nanjo; Phillip B Warner; Ameen Abu-Hanna; Kensaku Kawamoto

doi:10.1093/jamiaopen/ooae153

Approaches for extracting daily dosage from free-text prescription signatures in heart failure with reduced ejection fraction: a comparative study

JAMIA Open. 2025 Jan 3;8(1):ooae153. doi: 10.1093/jamiaopen/ooae153. eCollection 2025 Feb.

Authors

Theodorus S Haaker^{1

2}, Joshua S Choi^{1

3}, Claude J Nanjo¹, Phillip B Warner¹, Ameen Abu-Hanna², Kensaku Kawamoto¹

Affiliations

¹ Department of Biomedical Informatics, University of Utah, Salt Lake City, UT 84108, United States.
² Department of Medical Informatics, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands.
³ Department of Internal Medicine, University of Utah Health, Salt Lake City, UT 84112, United States.

Abstract

Objective: To compare various methods for extracting daily dosage information from prescription signatures (sigs) and identify the best performers.

Materials and methods: In this study, 5 daily dosage extraction methods were identified. Parsigs, RxSig, Sig2db, a large language model (LLM), and a bidirectional long short-term memory (BiLSTM) model were selected. The methods were analyzed with regard to positive predictive value (PPV), sensitivity, F1-score, cost to compute, and time to finish on a sig dataset in the context of heart failure with reduced ejection fraction.

Results: The dataset consisted of 29 896 free-text sigs, which were split into training and validation sets of 70% and 30%, respectively. The BiLSTM model scored lowest with an F1-score of 0.71. The LLM GPT-4o and regular expression-based RxSig achieved the highest F1-scores with 0.98 and 0.95, respectively. The LLM outperformed RxSig in sensitivity. RxSig outperformed the LLM in PPV. Additionally, RxSig had a lower run time and no costs compared to a cost of 25 dollars.

Discussion: In practical usage, it would be preferable for an algorithm to score high on PPV and F1-score, to reduce false positive assertions of daily dosage. Additionally, long running times and high costs are not scalable for larger datasets. Thus, RxSig is likely the most scalable approach. Further research is needed to investigate the generalizability of the findings.

Conclusion: This study demonstrates that both the LLM and RxSig models excel in daily dose extraction from free-text sigs, with the RxSig model appearing to be the more scalable approach.

Keywords: clinical information extraction; daily dosage; daily dosage extraction; electronic prescribing; structured medication order.