Background: Diagnostic test sets are a valuable research tool that contributes importantly to the validity and reliability of studies that assess agreement in breast pathology. In order to fully understand the strengths and weaknesses of any agreement and reliability study, however, the methods should be fully reported. In this paper we provide a step-by-step description of the methods used to create four complex test sets for a study of diagnostic agreement among pathologists interpreting breast biopsy specimens. We use the newly developed Guidelines for Reporting Reliability and Agreement Studies (GRRAS) as a basis to report these methods.
Methods: Breast tissue biopsies were selected from the National Cancer Institute-funded Breast Cancer Surveillance Consortium sites. We used a random sampling stratified according to woman's age (40-49 vs. ≥50), parenchymal breast density (low vs. high) and interpretation of the original pathologist. A 3-member panel of expert breast pathologists first independently interpreted each case using five primary diagnostic categories (non-proliferative changes, proliferative changes without atypia, atypical ductal hyperplasia, ductal carcinoma in situ, and invasive carcinoma). When the experts did not unanimously agree on a case diagnosis a modified Delphi method was used to determine the reference standard consensus diagnosis. The final test cases were stratified and randomly assigned into one of four unique test sets.
Conclusions: We found GRRAS recommendations to be very useful in reporting diagnostic test set development and recommend inclusion of two additional criteria: 1) characterizing the study population and 2) describing the methods for reference diagnosis, when applicable.