Validation of a Zero-Shot Learning Natural Language Processing Tool for Data Abstraction from Unstructured Healthcare Data
Authors:
Basil Kaufmann,
Dallin Busby,
Chandan Krushna Das,
Neeraja Tillu,
Mani Menon,
Ashutosh K. Tewari,
Michael A. Gorin
Abstract:
Objectives: To describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records. Materials and Methods: A data abstraction tool based on the GPT-3.5 model from OpenAI was developed and compared to three physician human abstractors in…
▽ More
Objectives: To describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records. Materials and Methods: A data abstraction tool based on the GPT-3.5 model from OpenAI was developed and compared to three physician human abstractors in terms of time to task completion and accuracy for abstracting data on 14 unique variables from a set of 199 de-identified radical prostatectomy pathology reports. The reports were processed by the software tool in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. The tool was assessed for superiority for data abstraction speed and non-inferiority for accuracy. Results: The human abstractors required a mean of 101s per report for data abstraction, with times varying from 15 to 284 s. In comparison, the software tool required a mean of 12.8 s to process the vectorized reports and a mean of 15.8 to process the scanned reports (P < 0.001). The overall accuracies of the three human abstractors were 94.7%, 97.8%, and 96.4% for the combined set of 2786 datapoints. The software tool had an overall accuracy of 94.2% for the vectorized reports, proving to be non-inferior to the human abstractors at a margin of -10% ($α$=0.025). The tool had a slightly lower accuracy of 88.7% using the scanned reports, proving to be non-inferiority to 2 out of 3 human abstractors. Conclusion: The developed zero-shot learning NLP tool affords researchers comparable levels of accuracy to that of human abstractors, with significant time savings benefits. Because of the lack of need for task-specific model training, the developed tool is highly generalizable and can be used for a wide variety of data abstraction tasks, even outside the field of medicine.
△ Less
Submitted 23 July, 2023;
originally announced August 2023.
A Discontinuous Neural Network for Non-Negative Sparse Approximation
Authors:
Martijn Arts,
Marius Cordts,
Monika Gorin,
Marc Spehr,
Rudolf Mathar
Abstract:
This paper investigates a discontinuous neural network which is used as a model of the mammalian olfactory system and can more generally be applied to solve non-negative sparse approximation problems. By inherently limiting the systems integrators to having non-negative outputs, the system function becomes discontinuous since the integrators switch between being inactive and being active. It is sh…
▽ More
This paper investigates a discontinuous neural network which is used as a model of the mammalian olfactory system and can more generally be applied to solve non-negative sparse approximation problems. By inherently limiting the systems integrators to having non-negative outputs, the system function becomes discontinuous since the integrators switch between being inactive and being active. It is shown that the presented network converges to equilibrium points which are solutions to general non-negative least squares optimization problems. We specify a Caratheodory solution and prove that the network is stable, provided that the system matrix has full column-rank. Under a mild condition on the equilibrium point, we show that the network converges to its equilibrium within a finite number of switches. Two applications of the neural network are shown. Firstly, we apply the network as a model of the olfactory system and show that in principle it may be capable of performing complex sparse signal recovery tasks. Secondly, we generalize the application to include non-negative sparse approximation problems and compare the recovery performance to a classical non-negative basis pursuit denoising algorithm. We conclude that the recovery performance differs only marginally from the classical algorithm, while the neural network has the advantage that no performance critical regularization parameter has to be chosen prior to recovery.
△ Less
Submitted 21 March, 2016;
originally announced March 2016.