Analysis of literature-derived duplicate records in the FDA Adverse Event Reporting System (FAERS) database

Can J Physiol Pharmacol. 2024 Dec 4. doi: 10.1139/cjpp-2024-0078. Online ahead of print.

Abstract

The FDA Adverse Event Reporting System (FAERS) is a large-scale repository of reports concerning adverse drug events (ADEs). The same published clinical study or report may be reviewed by multiple companies or healthcare professionals and reported separately to the FDA, leading to a significant presence of duplicate reports in FAERS. These duplicate records can result in the identification of false associations between a given drug and an ADE. In this study, we first assessed the consistency of drug and ADE information in FAERS reports from Alzheimer's disease patients. Our findings showed greater congruence in drug-related information compared to ADE-related information, likely due to the greater heterogeneity and variety of terms or phrases used to describe ADEs. We then demonstrated that text comparison methods are effective in identifying duplicate records based on literature citations, testing 10 different comparison functions for their overall efficacy. Token-based methods (such as COSINE, QGRAM, and JACCARD), edit-based approaches (including OSA, LV, and DL), and sequence-based techniques like LCS have proven highly effective in accurately detecting identical publications within free text, demonstrating both high sensitivity and specificity. These results offer valuable insights for identifying duplicate FAERS reports and improving the reliability of detected associations between drugs and ADEs.

Keywords: FAERS Dashboard; PubMed; adverse drug events; pharmacovigilance; text comparison.