The GRADE approach is reproducible in assessing the quality of evidence of quantitative evidence syntheses

Reem A Mustafa; Nancy Santesso; Jan Brozek; Elie A Akl; Stephen D Walter; Geoff Norman; Mahan Kulasegaram; Robin Christensen; Gordon H Guyatt; Yngve Falck-Ytter; Stephanie Chang; Mohammad Hassan Murad; Gunn E Vist; Toby Lasserson; Gerald Gartlehner; Vijay Shukla; Xin Sun; Craig Whittington; Piet N Post; Eddy Lang; Kylie Thaler; Ilkka Kunnamo; Heidi Alenius; Joerg J Meerpohl; Ana C Alba; Immaculate F Nevis; Stephen Gentles; Marie-Chantal Ethier; Alonso Carrasco-Labra; Rasha Khatib; Gihad Nesrallah; Jamie Kroft; Amanda Selk; Romina Brignardello-Petersen; Holger J Schünemann

doi:10.1016/j.jclinepi.2013.02.004

The GRADE approach is reproducible in assessing the quality of evidence of quantitative evidence syntheses

J Clin Epidemiol. 2013 Jul;66(7):736-42; quiz 742.e1-5. doi: 10.1016/j.jclinepi.2013.02.004. Epub 2013 Apr 23.

Affiliation

¹ Department of Clinical Epidemiology & Biostatistics, McMaster University, Hamilton, Ontario L8S 4K1, Canada.

PMID: 23623694
DOI: 10.1016/j.jclinepi.2013.02.004

Abstract

Objective: We evaluated the inter-rater reliability (IRR) of assessing the quality of evidence (QoE) using the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach.

Study design and setting: On completing two training exercises, participants worked independently as individual raters to assess the QoE of 16 outcomes. After recording their initial impression using a global rating, raters graded the QoE following the GRADE approach. Subsequently, randomly paired raters submitted a consensus rating.

Results: The IRR without using the GRADE approach for two individual raters was 0.31 (95% confidence interval [95% CI] = 0.21-0.42) among Health Research Methodology students (n = 10) and 0.27 (95% CI = 0.19-0.37) among the GRADE working group members (n = 15). The corresponding IRR of the GRADE approach in assessing the QoE was significantly higher, that is, 0.66 (95% CI = 0.56-0.75) and 0.72 (95% CI = 0.61-0.79), respectively. The IRR further increased for three (0.80 [95% CI = 0.73-0.86] and 0.74 [95% CI = 0.65-0.81]) or four raters (0.84 [95% CI = 0.78-0.89] and 0.79 [95% CI = 0.71-0.85]). The IRR did not improve when QoE was assessed through a consensus rating.

Conclusion: Our findings suggest that trained individuals using the GRADE approach improves reliability in comparison to intuitive judgments about the QoE and that two individual raters can reliably assess the QoE using the GRADE system.

MeSH terms

Canada
Evidence-Based Medicine / standards*
Humans
Practice Guidelines as Topic / standards*
Pulmonary Disease, Chronic Obstructive / therapy
Reproducibility of Results
Research Design*
Self Care / methods
Surveys and Questionnaires
Validation Studies as Topic