Objectives: The objective of the study was to assess the inter-rater reliability (IRR) of AMSTAR and ROBIS in judging individual domains and overall methodological quality/risk of bias of systematic reviews, the concurrent validity of the tools, and the time required to apply them.
Study design and setting: This is a cross-sectional study. Five raters independently read 31 systematic reviews and applied AMSTAR and ROBIS. Fleiss' k for multiple raters for individual domains and overall methodological quality/risk of bias was calculated. Similar domains assessed by both tools and final scores were matched to explore the concurrent validity, using the Kendall tau correlation.
Results: IRR ranged from fair to perfect for AMSTAR and from moderate to substantial for ROBIS. Kappa for overall quality/risk of bias was 0.73 (95% confidence interval [CI] 0.65-0.81) for AMSTAR and 0.64 (95% CI 0.54-0.74) for ROBIS. We judged most of the reviews at intermediate quality with AMSTAR (53%), while judgments were split in high (53%) and low (47%) risk of bias with ROBIS. The correlation between judgments on similar domains ranged from moderate to high, while it was fair on the overall judgment (K = 0.35, 95% CI 0.21-0.49). The mean time to complete ROBIS was about double that for AMSTAR.
Conclusion: AMSTAR and ROBIS offer similar IRR but differ in their construct and applicability.
Keywords: AMSTAR; Guidelines; Methodological quality; ROBIS; Risk of bias; Systematic reviews.
Copyright © 2018 Elsevier Inc. All rights reserved.