Procedures are compared for testing the homogeneity of k > or = 2 independent kappa statistics in the case of two raters and a dichotomous outcome. One of the procedures is based on the estimated large sample variance derived under a model frequently adopted for inferences concerning interobserver agreement. The other is based on a goodness-of-fit approach to this model. The results of a Monte Carlo simulation show that the two approaches have similar properties if the number of subjects in each sample is large (> 100), and the prevalence of the underlying trait of interest is not extreme, while the goodness-of-fit approach is recommended for comparisons involving smaller numbers of subjects or in which the prevalence of the underlying trait is small (< 0.3).