Risk of bias in nonrandomized studies of interventions showed low inter-rater reliability and challenges in its application

Silvia Minozzi; Michela Cinquini; Silvia Gianola; Greta Castellini; Chiara Gerardi; Rita Banzi

doi:10.1016/j.jclinepi.2019.04.001

Risk of bias in nonrandomized studies of interventions showed low inter-rater reliability and challenges in its application

J Clin Epidemiol. 2019 Aug:112:28-35. doi: 10.1016/j.jclinepi.2019.04.001. Epub 2019 Apr 11.

Authors

Silvia Minozzi¹, Michela Cinquini², Silvia Gianola³, Greta Castellini⁴, Chiara Gerardi², Rita Banzi²

Affiliations

¹ Department of Epidemiology, Lazio Regional Health Service, via Cristoforo Colombo 112, 00147 Rome, Italy; Department of Biomedical Sciences for Health, University of Milan, via Carlo Pascal 36, 20133 Milan, Italy. Electronic address: [email protected].
² Mario Negri Institute for Pharmacological Research IRCCS, Via Giuseppe La Masa 19, 20156, Milan, Italy.
³ IRCCS Istituto Ortopedico Galeazzi, Unit of Clinical Epidemiology, Via R.Galeazzi 4, 20162, Milan, Italy.
⁴ Department of Biomedical Sciences for Health, University of Milan, via Carlo Pascal 36, 20133 Milan, Italy; IRCCS Istituto Ortopedico Galeazzi, Unit of Clinical Epidemiology, Via R.Galeazzi 4, 20162, Milan, Italy.

PMID: 30981833
DOI: 10.1016/j.jclinepi.2019.04.001

Abstract

Objective: To assess the inter-rater reliability (IRR) and usability of the risk of bias in nonrandomized studies of interventions tool (ROBINS-I).

Study design and setting: We designed a cross-sectional study. Five raters independently applied ROBINS-I to the nonrandomized cohort studies in three systematic reviews on vaccines, opiate abuse, and rehabilitation. We calculated Fleiss' Kappa for multiple raters as a measure of IRR and discussed the application of ROBINS-I to identify difficulties and possible reasons for disagreement.

Results: Thirty one studies were included (195 evaluations). IRRs were slight for overall judgment (IRR 0.06, 95% CI 0.001 to 0.12) and individual domains (from 0.04, 95% CI -0.04 to 0.12 for the domain "selection of reported results" to 0.18, 95% CI 0.10 to 0.26 for the domain "deviation from intended interventions"). Mean time to apply the tool was 27.8 minutes (SD 12.6) per study. The main difficulties were due to poor reporting of primary studies, misunderstanding of the question, translation of questions into a final judgment, and incomplete guidance.

Conclusion: We found ROBINS-I difficult and demanding, even for raters with substantial expertise in systematic reviews. Calibration exercises and intensive training before its application are needed to improve reliability.

Keywords: Inter-rater reliability; Nonrandomized studies; ROBINS-I; Risk of bias; Systematic reviews.

MeSH terms

Ankle Injuries / rehabilitation
Bias
Cross-Sectional Studies
Humans
Non-Randomized Controlled Trials as Topic / standards*
Opioid-Related Disorders / epidemiology
Reproducibility of Results
Research Design*
Risk Assessment / methods*
Systematic Reviews as Topic
Vaccines / therapeutic use

Substances

Vaccines