Interobserver variation in prostate cancer Gleason scoring: are there implications for the design of clinical trials and treatment strategies?

M McLean; J Srigley; D Banerjee; P Warde; Y Hao

doi:10.1016/s0936-6555(97)80005-2

Interobserver variation in prostate cancer Gleason scoring: are there implications for the design of clinical trials and treatment strategies?

Clin Oncol (R Coll Radiol). 1997;9(4):222-5. doi: 10.1016/s0936-6555(97)80005-2.

Authors

M McLean¹, J Srigley, D Banerjee, P Warde, Y Hao

Affiliation

¹ Princess Margaret Hospital/University of Toronto, Canada.

PMID: 9315395
DOI: 10.1016/s0936-6555(97)80005-2

Abstract

A series of prostate cancer histological slides from 71 patients were used to measure the interobserver variation among three pathologists awarding a Gleason score. The study was prompted on account of the use of histological grade to stratify patients prior to randomization within two clinical trials currently recruiting at our centre, and a proposed study that would allocate treatment depending upon the score awarded. The pathologists were expected to award a score based upon their day to day experience, there being no consensus meeting before-hand to agree on the grey areas of the Gleason grading system. We used the kappa statistic to assess the level of agreement. This was calculated both for comparison of the raw scores awarded by the three observers, as well as the grouped scores corresponding to those groupings used for the purposes of stratification in the two trials. The extent of the interobserver variation (weighted kappa) for the raw scores (Gleason scores 2-10) was 0.16 to 0.29 and for the grouped scores (Gleason scores < or = 7 or > or = 8), kappa was 0.15 to 0.29. For the raw scores, the total agreement rate was 9.9% and the total disagreement 26.8%; for the grouped scores the total agreement rate was 43.7%. It is concluded that, despite this level of agreement there is no concern regarding stratification using the Gleason score, because of the subsequent randomization. However, using a reported Gleason score to determine treatment might be inappropriate. These data indicate the value of a central review process for pathology grading in clinical trials, especially where the treatment is directly affected by this information.

MeSH terms

Adenocarcinoma / pathology*
Adenocarcinoma / therapy
Clinical Trials as Topic
Humans
Male
Observer Variation
Prostatic Neoplasms / pathology*
Prostatic Neoplasms / therapy
Research Design