Background: ADR is a widely used colonoscopy quality indicator. Calculation of ADR is labor-intensive and cumbersome using current electronic medical databases. Natural language processing (NLP) is a method used to extract meaning from unstructured or free text data.
Aims: (1) To develop and validate an accurate automated process for calculation of adenoma detection rate (ADR) and serrated polyp detection rate (SDR) on data stored in widely used electronic health record systems, specifically Epic electronic health record system, Provation® endoscopy reporting system, and Sunquest PowerPath pathology reporting system.
Methods: Screening colonoscopies performed between June 2010 and August 2015 were identified using the Provation® reporting tool. An NLP pipeline was developed to identify adenomas and sessile serrated polyps (SSPs) on pathology reports corresponding to these colonoscopy reports. The pipeline was validated using a manual search. Precision, recall, and effectiveness of the natural language processing pipeline were calculated. ADR and SDR were then calculated.
Results: We identified 8032 screening colonoscopies that were linked to 3821 pathology reports (47.6%). The NLP pipeline had an accuracy of 100% for adenomas and 100% for SSPs. Mean total ADR was 29.3% (range 14.7-53.3%); mean male ADR was 35.7% (range 19.7-62.9%); and mean female ADR was 24.9% (range 9.1-51.0%). Mean total SDR was 4.0% (0-9.6%).
Conclusions: We developed and validated an NLP pipeline that accurately and automatically calculates ADRs and SDRs using data stored in Epic, Provation® and Sunquest PowerPath. This NLP pipeline can be used to evaluate colonoscopy quality parameters at both individual and practice levels.
Keywords: Adenoma detection rate; Electronic health record; Natural language processing; Quality.