The revised Cochrane risk of bias tool for randomized trials (RoB 2) showed low interrater reliability and challenges in its application

Silvia Minozzi; Michela Cinquini; Silvia Gianola; Marien Gonzalez-Lorenzo; Rita Banzi

doi:10.1016/j.jclinepi.2020.06.015

The revised Cochrane risk of bias tool for randomized trials (RoB 2) showed low interrater reliability and challenges in its application

J Clin Epidemiol. 2020 Oct:126:37-44. doi: 10.1016/j.jclinepi.2020.06.015. Epub 2020 Jun 18.

Authors

Silvia Minozzi¹, Michela Cinquini², Silvia Gianola³, Marien Gonzalez-Lorenzo², Rita Banzi⁴

Affiliations

¹ Department of Epidemiology, Lazio Regional Health Service, Rome, Italy; Laboratory of Clinical Research Methodology, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy. Electronic address: [email protected].
² Laboratory of Clinical Research Methodology, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy.
³ Unit of Clinical Epidemiology, IRCCS Istituto Ortopedico Galeazzi, Milan, Italy.
⁴ Center for Health Regulatory Policies, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy.

PMID: 32562833
DOI: 10.1016/j.jclinepi.2020.06.015

Abstract

Objective: The objective of the study is to assess the interrater reliability (IRR) and usability of the revised Cochrane risk of bias tool for randomized trials (RoB 2).

Study design and setting: This is a cross-sectional study. Four raters independently applied RoB 2 on the primary outcome of a random sample of individually randomized parallel-group trials (randomized controlled trials (RCTs)). We calculated the Fleiss' kappa for multiple raters, the time needed to complete the tool, and discussed the application of RoB 2 to identify difficulties and reasons for disagreement.

Results: A total of 70 outcomes from 70 RCTs were included. IRR was slight for overall judgment (IRR 0.16, 95% confidence interval (CI) 0.08-0.24); individual domain analysis gave IRR as moderate for "randomization process" (IRR 0.45, 95% CI 0.37-0.53), slight for "deviations from intended intervention" for RCTs assessing the effect of the assignment to an intervention (IRR 0.04, 95% CI -0.06 to 0.14), fair for those assessing the effect of adhering (IRR 0.21, 95% CI 0.11-0.31), and fair for the other domains, ranging from 0.22 (95% CI 0.14-0.30) for "missing outcome data" to 0.30 (95% CI 0.22-0.38) for "selection of reported results". Mean time to apply the tool was 28 minutes (standard deviation 13.4) per study outcome. The main difficulties were due to poor knowledge of the subject matter of primary studies, new terminology, different approaches for some domains compared with the previous tool, and way of formulating signaling questions.

Conclusions: RoB 2 is a detailed and comprehensive tool but difficult and demanding, even for raters with substantial expertise in systematic reviews. Calibration exercises and intensive training are needed before its application, to improve reliability.

Keywords: Interrater reliability; Randomized controlled trials; Risk of bias; RoB 2; Systematic reviews.

Publication types

Comparative Study

MeSH terms

Bias
Cross-Sectional Studies
Data Analysis
Data Collection / methods*
Humans
Judgment / physiology*
Knowledge
Outcome Assessment, Health Care
Randomized Controlled Trials as Topic
Reproducibility of Results
Research Design
Research Personnel / statistics & numerical data*
Research Personnel / trends
Risk