Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials: a meta-epidemiological study

Cochrane Database Syst Rev. 2024 Jan 4;1(1):MR000034. doi: 10.1002/14651858.MR000034.pub3.

Abstract

Background: Researchers and decision-makers often use evidence from randomised controlled trials (RCTs) to determine the efficacy or effectiveness of a treatment or intervention. Studies with observational designs are often used to measure the effectiveness of an intervention in 'real world' scenarios. Numerous study designs and their modifications (including both randomised and observational designs) are used for comparative effectiveness research in an attempt to give an unbiased estimate of whether one treatment is more effective or safer than another for a particular population. An up-to-date systematic analysis is needed to identify differences in effect estimates from RCTs and observational studies. This updated review summarises the results of methodological reviews that compared the effect estimates of observational studies with RCTs from evidence syntheses that addressed the same health research question.

Objectives: To assess and compare synthesised effect estimates by study type, contrasting RCTs with observational studies. To explore factors that might explain differences in synthesised effect estimates from RCTs versus observational studies (e.g. heterogeneity, type of observational study design, type of intervention, and use of propensity score adjustment). To identify gaps in the existing research comparing effect estimates across different study types.

Search methods: We searched MEDLINE, the Cochrane Database of Systematic Reviews, Web of Science databases, and Epistemonikos to May 2022. We checked references, conducted citation searches, and contacted review authors to identify additional reviews.

Selection criteria: We included systematic methodological reviews that compared quantitative effect estimates measuring the efficacy or effectiveness of interventions tested in RCTs versus in observational studies. The included reviews compared RCTs to observational studies (including retrospective and prospective cohort, case-control and cross-sectional designs). Reviews were not eligible if they compared RCTs with studies that had used some form of concurrent allocation.

Data collection and analysis: Using results from observational studies as the reference group, we examined the relative summary effect estimates (risk ratios (RRs), odds ratios (ORs), hazard ratios (HRs), mean differences (MDs), and standardised mean differences (SMDs)) to evaluate whether there was a relatively larger or smaller effect in the ratio of odds ratios (ROR) or ratio of risk ratios (RRR), ratio of hazard ratios (RHR), and difference in (standardised) mean differences (D(S)MD). If an included review did not provide an estimate comparing results from RCTs with observational studies, we generated one by pooling the estimates for observational studies and RCTs, respectively. Across all reviews, we synthesised these ratios to produce a pooled ratio of ratios comparing effect estimates from RCTs with those from observational studies. In overviews of reviews, we estimated the ROR or RRR for each overview using observational studies as the reference category. We appraised the risk of bias in the included reviews (using nine criteria in total). To receive an overall low risk of bias rating, an included review needed: explicit criteria for study selection, a complete sample of studies, and to have controlled for study methodological differences and study heterogeneity. We assessed reviews/overviews not meeting these four criteria as having an overall high risk of bias. We assessed the certainty of the evidence, consisting of multiple evidence syntheses, with the GRADE approach.

Main results: We included 39 systematic reviews and eight overviews of reviews, for a total of 47. Thirty-four of these contributed data to our primary analysis. Based on the available data, we found that the reviews/overviews included 2869 RCTs involving 3,882,115 participants, and 3924 observational studies with 19,499,970 participants. We rated 11 reviews/overviews as having an overall low risk of bias, and 36 as having an unclear or high risk of bias. Our main concerns with the included reviews/overviews were that some did not assess the quality of their included studies, and some failed to account appropriately for differences between study designs - for example, they conducted aggregate analyses of all observational studies rather than separate analyses of cohort and case-control studies. When pooling RORs and RRRs, the ratio of ratios indicated no difference or a very small difference between the effect estimates from RCTs versus from observational studies (ratio of ratios 1.08, 95% confidence interval (CI) 1.01 to 1.15). We rated the certainty of the evidence as low. Twenty-three of 34 reviews reported effect estimates of RCTs and observational studies that were on average in agreement. In a number of subgroup analyses, small differences in the effect estimates were detected: - pharmaceutical interventions only (ratio of ratios 1.12, 95% CI 1.04 to 1.21); - RCTs and observational studies with substantial or high heterogeneity; that is, I2 ≥ 50% (ratio of ratios 1.11, 95% CI 1.04 to 1.18); - no use (ratio of ratios 1.07, 95% CI 1.03 to 1.11) or unclear use (ratio of ratios 1.13, 95% CI 1.03 to 1.25) of propensity score adjustment in observational studies; and - observational studies without further specification of the study design (ratio of ratios 1.06, 95% CI 0.96 to 1.18). We detected no clear difference in other subgroup analyses.

Authors' conclusions: We found no difference or a very small difference between effect estimates from RCTs and observational studies. These findings are largely consistent with findings from recently published research. Factors other than study design need to be considered when exploring reasons for a lack of agreement between results of RCTs and observational studies, such as differences in the population, intervention, comparator, and outcomes investigated in the respective studies. Our results underscore that it is important for review authors to consider not only study design, but the level of heterogeneity in meta-analyses of RCTs or observational studies. A better understanding is needed of how these factors might yield estimates reflective of true effectiveness.

Publication types

  • Comparative Study
  • Meta-Analysis

MeSH terms

  • Bias
  • Case-Control Studies
  • Delivery of Health Care*
  • Humans
  • Observational Studies as Topic / methods
  • Outcome Assessment, Health Care
  • Randomized Controlled Trials as Topic
  • Systematic Reviews as Topic