Background: Since January 2003, all clinical scientific articles published in the American volume of The Journal of Bone and Joint Surgery (JBJS-A) have included a level-of-evidence rating. The aim of the current study was to evaluate the interobserver agreement among reviewers, with varying levels of epidemiology training, in categorizing the levels of evidence of these clinical studies.
Methods: Fifty-one consecutive clinical papers published in the American volume of JBJS were identified by a computerized search of the table of contents from January 2003 through June 2003. Each paper was blinded so that only the title, abstract (without the level of evidence designated), and methods section were provided to the reviewers. The papers were coded and were randomly organized in a binder. Six surgeons graded each blinded paper for (1) the type of study (therapeutic, prognostic, diagnostic test, or economic or decision analysis), (2) the level of evidence (on a scale of I through V), and (3) the subcategory within the particular level of evidence. Three surgeons were members of JBJS American Editorial Board, two surgeons were reviewers for JBJS-A, and one surgeon was an active researcher not formally associated with JBJS-A. The reviewers did not receive any formal training in the application of the classification system, but each was provided with a detailed description of the classification system used by JBJS-A. Intraclass correlation coefficients with 95% confidence intervals were determined for the reviewers' agreement regarding the type of study, level of evidence, and subcategory within the level of evidence.
Results: The majority (69%) of the fifty-one included articles were studies of therapy, and 57% of the studies constituted Level-IV evidence. The intraclass correlation coefficients for the agreement among all reviewers with regard to the study type, level of evidence, and subcategory within the level of evidence ranged from 0.61 to 0.75. Reviewers trained in epidemiology demonstrated greater agreement (range in intraclass correlation coefficients, 0.99 to 1.0), across all aspects of the classification system, than did reviewers who were not trained in epidemiology (range in intraclass correlation coefficients, 0.60 to 0.75).
Conclusions: These findings suggest that epidemiology and non-epidemiology-trained reviewers can apply the levels-of-evidence guide to published studies with acceptable interobserver agreement. The validity of this system remains a question for future research.