Objective: The radiographic diagnosis of osteoarthritis (OA) in the peripheral skeleton is dependent on the skilled examination of several morphological characteristics of the condition as visualised on plain radiographs. However, the process is perceptual and generally enhanced by comparison against photographic standards. This study assessed the intra-rater and inter-rater reliability of radiologists experienced in reporting hand, hip and knee films derived from a community-based sample when using the photographic atlas recently developed by Burnett et al.
Methods: This study was part of a multifaceted diagnostics protocol, evaluating methodological issues, in the conduct of genetic research in osteoarthritis. From a cohort of 118 twin pairs, registered with the Australian Twins Registry (ATR), standard clinical examinations were performed on 74 complete and 11 incomplete pairs of twins over age 50 years, followed by standard AP hand, AP pelvis and AP standing radiographs of the knees. The pairs were selected both to represent twin pairs who had previously self reported a diagnosis of OA, as well as those who had not. Radiologists read the films blind to the original self reported diagnosis and without reference to their pairing. The films were read by comparison against photographic standards and were scored according to specific features. All films were read independently by two consultant radiologists blind to one another's assessments, and selected films were thereafter assigned for rereading. Inter-rater and intra-rater agreement were different for different features, different anatomic areas, and, for the former, were different for the two radiologists.
Results: Inter-rater agreement was different for different anatomic areas, different radiographic features, and the two radiologists. Intra-rater agreement for the presence or absence of OA was as follows: actual observed agreement = 0.79 to 0.97 and 0.83 to 0.98; adjusted kappa statistic = 0.58 to 0.94 and 0.67 to 0.96; inter-rater agreement was as follows: actual observed agreement = 0. 77 to 0.97; adjusted kappa statistic = 0.54 to 0.94. Agreement was generally high in most of the principal target joints for OA: DIP, PIP, 1st CMC, hip and knee.
Conclusions: Although assessor agreement was not perfect, it is concluded that for genetic epidemiology purposes, while duplicate assessments may be advantageous, it is possible for radiographs to be examined accurately by a single experienced assessor. However, for less experienced assessors independent examinations should be made by at least two assessors and either a consensus reached on disparate examinations or an algorithm developed to adjudicate any discrepancies.