Quality and risk of bias appraisals of systematic reviews are inconsistent across reviewers and centers

J Clin Epidemiol. 2020 Sep:125:9-15. doi: 10.1016/j.jclinepi.2020.04.026. Epub 2020 May 19.

Abstract

Objective: The objective of the study was to evaluate the inter-rater and intercenter reliability, usability, and utility of A MeaSurement Tool to Assess systematic Reviews (AMSTAR), AMSTAR 2, and Risk Of Bias In Systematic reviews (ROBIS).

Study design and setting: This is a prospective evaluation using 30 systematic reviews of randomized trials, undertaken at three international centers.

Results: Reviewers completed AMSTAR, AMSTAR 2, and ROBIS in median (interquartile range) 15.7 (11.3), 19.7 (12.1), and 28.7 (17.4) minutes and reached consensus in 2.6 (3.2), 4.6 (5.3), and 10.9 (10.8) minutes, respectively. Across all centers, inter-rater reliability was substantial to almost perfect for 8/11 AMSTAR, 9/16 AMSTAR 2, and 12/24 ROBIS items. Intercenter reliability was substantial to almost perfect for 6/11 AMSTAR, 12/16 AMSTAR 2, and 7/24 ROBIS items. Intercenter reliability for confidence in the results of the review or overall risk of bias was moderate (Gwet's first-order agreement coefficient (AC1) 0.58, 95% confidence intervals [CI]: 0.30 to 0.85) to substantial (AC1 0.74, 95% CI: 0.30 to 0.85) for AMSTAR 2 and poor (AC1 -0.21, 95% CI: -0.55 to 0.13) to moderate (AC1 0.56, 95% CI: 0.30 to 0.83) for ROBIS. It is not clear whether using the appraisals of any tool as an inclusion criterion would alter an overview's findings.

Conclusions: Improved guidance may be needed to facilitate the consistent interpretation and application of the newer tools (especially ROBIS).

Keywords: AMSTAR; AMSTAR 2; Methodological quality; ROBIS; Risk of bias; Systematic reviews.

MeSH terms

  • Bias
  • Evidence-Based Medicine
  • Humans
  • Observer Variation
  • Prospective Studies
  • Quality Control
  • Reproducibility of Results
  • Systematic Reviews as Topic / standards*