Epidemiologic research using probabilistic outcome definitions

Pharmacoepidemiol Drug Saf. 2015 Jan;24(1):19-26. doi: 10.1002/pds.3706. Epub 2014 Sep 25.

Abstract

Background: Epidemiologic studies using electronic healthcare data often define the presence or absence of binary clinical outcomes by using algorithms with imperfect specificity, sensitivity, and positive predictive value. This results in misclassification and bias in study results.

Methods: We describe and evaluate a new method called probabilistic outcome definition (POD) that uses logistic regression to estimate the probability of a clinical outcome using multiple potential algorithms and then uses multiple imputation to make valid inferences about the risk ratio or other epidemiologic parameters of interest. We conducted a simulation to evaluate the performance of the POD method with two variables that can predict the true outcome and compared the POD method with the conventional method.

Results: The simulation results showed that when the true risk ratio is equal to 1.0 (null), the conventional method based on a binary outcome provides unbiased estimates. However, when the risk ratio is not equal to 1.0, the traditional method, either using one predictive variable or both predictive variables to define the outcome, is biased when the positive predictive value is <100%, and the bias is very severe when the sensitivity or positive predictive value is poor (less than 0.75 in our simulation). In contrast, the POD method provides unbiased estimates of the risk ratio both when this measure of effect is equal to 1.0 and not equal to 1.0. Even when the sensitivity and positive predictive value are low, the POD method continues to provide unbiased estimates of the risk ratio.

Conclusions: The POD method provides an improved way to define outcomes in database research. This method has a major advantage over the conventional method in that it provided unbiased estimates of risk ratios and it is easy to use.

Keywords: bias; database research; epidemiological methods; multiple imputation; outcome validation; pharmacoepidemiology; probabilistic outcome definition.

MeSH terms

  • Databases, Factual / statistics & numerical data*
  • Epidemiologic Methods*
  • Epidemiologic Studies
  • Humans
  • Probability*
  • Treatment Outcome*