Purpose: To evaluate the impact of multiple design criteria for reference sets that are used to quantitatively assess the performance of pharmacovigilance signal detection algorithms (SDAs) for drug-drug interactions (DDIs).
Methods: Starting from a large and diversified reference set for two-way DDIs, we generated custom-made reference sets of various sizes considering multiple design criteria (e.g., adverse event background prevalence). We assessed differences observed in the performance metrics of three SDAs when applied to FDA Adverse Event Reporting System (FAERS) data.
Results: For some design criteria, the impact on the performance metrics was neglectable for the different SDAs (e.g., theoretical evidence associated with positive controls), while others (e.g., restriction to designated medical events, event background prevalence) seemed to have opposing and effects of different sizes on the Area Under the Curve (AUC) and positive predictive value (PPV) estimates.
Conclusions: The relative composition of reference sets can significantly impact the evaluation metrics, potentially altering the conclusions regarding which methodologies are perceived to perform best. We therefore need to carefully consider the selection of controls to avoid misinterpretation of signals triggered by confounding factors rather than true associations as well as adding biases to our evaluation by "favoring" some algorithms while penalizing others.
Keywords: adverse events; drug-drug interactions; performance metrics; pharmacovigilance; postmarketing surveillance; signal detection; spontaneous reports data.
© 2023 The Authors. Pharmacoepidemiology and Drug Safety published by John Wiley & Sons Ltd.