Background: Individual case review of spontaneous adverse event (AE) reports remains a cornerstone of medical product safety surveillance for industry and regulators. Previously we developed the Vaccine Adverse Event Text Miner (VaeTM) to offer automated information extraction and potentially accelerate the evaluation of large volumes of unstructured data and facilitate signal detection.
Objective: To assess how the information extraction performed by VaeTM impacts the accuracy of a medical expert's review of the vaccine adverse event report.
Methods: The "outcome of interest" (diagnosis, cause of death, second level diagnosis), "onset time," and "alternative explanations" (drug, medical and family history) for the adverse event were extracted from 1000 reports from the Vaccine Adverse Event Reporting System (VAERS) using the VaeTM system. We compared the human interpretation, by medical experts, of the VaeTM extracted data with their interpretation of the traditional full text reports for these three variables. Two experienced clinicians alternately reviewed text miner output and full text. A third clinician scored the match rate using a predefined algorithm; the proportion of matches and 95% confidence intervals (CI) were calculated. Review time per report was analyzed.
Results: Proportion of matches between the interpretation of the VaeTM extracted data, compared to the interpretation of the full text: 93% for outcome of interest (95% CI: 91-94%) and 78% for alternative explanation (95% CI: 75-81%). Extracted data on the time to onset was used in 14% of cases and was a match in 54% (95% CI: 46-63%) of those cases. When supported by structured time data from reports, the match for time to onset was 79% (95% CI: 76-81%). The extracted text averaged 136 (74%) fewer words, resulting in a mean reduction in review time of 50 (58%) seconds per report.
Conclusion: Despite a 74% reduction in words, the clinical conclusion from VaeTM extracted data agreed with the full text in 93% and 78% of reports for the outcome of interest and alternative explanation, respectively. The limited amount of extracted time interval data indicates the need for further development of this feature. VaeTM may improve review efficiency, but further study is needed to determine if this level of agreement is sufficient for routine use.
Keywords: Biosurveillance; data mining; natural language processing; postmarketing; product surveillance.