Background: The NIDDK IBD Genetics Consortium (IBDGC) collects DNA and phenotypic data from inflammatory bowel disease (IBD) subjects to provide a resource for genetic studies. No previous studies have been performed on the reliability and validity of phenotypic determinations in either Crohn's disease (CD) or ulcerative colitis (UC) using primary records. Our aim was to determine the reliability and validity of these phenotypic assessments.
Methods: The de-identified records of 30 IBD patients were reviewed by 2 phenotypers per center using a standard protocol for phenotypic assessment. Each phenotyper evaluated 10 charts on 2 occasions 5 months apart. Reliability was expressed as the kappa (kappa) statistic. Performance characteristics were determined by comparison to a consensus-derived "gold standard" and by generation of receiver operating characteristic (ROC) curves.
Results: Agreement for diagnosis was excellent (kappa = 0.82; 95% confidence interval [CI]: 0.71-0.92). Agreement for CD location was good for jejunal, ileal, colorectal, and perianal disease with kappa between 0.60 and 0.74 but was fair for esophagogastroduodenal (kappa = 0.36). Agreement for UC extent (kappa = 0.67; 95% CI: 0.48-0.85), and CD behavior (kappa = 0.67; 95% CI: 0.49-0.83) were very good. Area under the ROC curves was greater than 0.84 for diagnosis, CD behavior, UC extent, and ileal and colonic CD location.
Conclusions: IBD phenotype classification using a standard protocol exhibited very good to excellent inter- and intrarater agreement and validity. This study highlights the importance of standard protocols in generating reliable and valid phenotypic assessments. The data will facilitate estimates of phenotyping misclassification rates that should be considered when making inferences from IBD genotype-phenotype studies.