Although colonoscopy is the most frequently performed endoscopic procedure, the lack of standardized reporting is impeding clinical and translational research. Inadequacies in data extraction from the raw, unstructured text in electronic health records (EHR) pose an additional challenge to procedure quality metric reporting, as vital details related to the procedure are stored in disparate documents. Currently, there is no EHR workflow that links these documents to the specific colonoscopy procedure, making the process of data extraction error prone. We hypothesize that extracting comprehensive colonoscopy quality metrics from consolidated procedure documents using computational linguistic techniques, and integrating it with discrete EHR data can improve quality of screening and cancer detection rate. As a first step, we developed an algorithm that links colonoscopy, pathology and imaging documents by analyzing the chronology of various orders placed relative to the colonoscopy procedure. The algorithm was installed and validated at the University of Arkansas for Medical Sciences (UAMS). The proposed algorithm in conjunction with Natural Language Processing (NLP) techniques can overcome current limitations of manual data abstraction.
Keywords: Colonoscopy; data integration; electronic health records; natural language processing; quality improvement.