Identifying free-text features to improve automated classification of structured histopathology reports for feline small intestinal disease

Abdullah Awaysheh; Jeffrey Wilcke; François Elvinger; Loren Rees; Weiguo Fan; Kurt Zimmerman

doi:10.1177/1040638717744002

Identifying free-text features to improve automated classification of structured histopathology reports for feline small intestinal disease

J Vet Diagn Invest. 2018 Mar;30(2):211-217. doi: 10.1177/1040638717744002. Epub 2017 Nov 30.

Authors

Abdullah Awaysheh^{1

2

3}, Jeffrey Wilcke^{1

2

3}, François Elvinger^{1

2

3}, Loren Rees^{1

2

3}, Weiguo Fan^{1

2

3}, Kurt Zimmerman^{1

2

3}

Affiliations

¹ Department of Biomedical Sciences and Pathobiology, VA-MD College of Veterinary Medicine (Awaysheh, Wilcke, Zimmerman), Virginia Tech, Blacksburg, VA.
² Department of Business Information Technology, Pamplin College of Business (Fan, Rees), Virginia Tech, Blacksburg, VA.
³ Animal Health Diagnostic Center, Cornell University, Ithaca, NY (Elvinger).

Abstract

The histologic evaluation of gastrointestinal (GI) biopsies is the standard for diagnosis of a variety of GI diseases (e.g., inflammatory bowel disease [IBD] and alimentary lymphoma [ALA]). The World Small Animal Veterinary Association (WSAVA) Gastrointestinal International Standardization Group proposed a reporting standard for GI biopsies consisting of a defined set of microscopic features. We compared the machine classification accuracy of free-text microscopic findings with those represented in the WSAVA format with a diagnosis of IBD and ALA. Unstructured free-text duodenal biopsy pathology reports from cats ( n = 60) with a diagnosis of IBD ( n = 20), ALA ( n = 20), or normal ( n = 20) were identified. Biopsy samples from these cases were then scored following the WSAVA guidelines to create a set of structured reports. Three supervised machine-learning algorithms were trained using the structured and then the unstructured reports. Diagnosis classification accuracy for the 3 algorithms was compared using the structured and unstructured reports. Using naive Bayes and neural networks, unstructured information-based models achieved higher diagnostic accuracy (0.90 and 0.88, respectively) compared to the structured information-based models (0.74 and 0.72, respectively). Results suggest that discriminating diagnostic information was lost using current WSAVA microscopic guideline features. Addition of free-text features (number of plasma cells) increased WSAVA auto-classification performance. The methodologies reported in our study represent a way of identifying candidate microscopic features for use in structured histopathology reports.

Keywords: Histopathology report; machine learning; structured report; text mining.

Publication types

Evaluation Study

MeSH terms

Algorithms
Animals
Bayes Theorem
Biopsy / veterinary
Cat Diseases / diagnosis*
Cat Diseases / pathology
Cats
Diagnostic Techniques and Procedures / veterinary
Duodenum / pathology
Female
Gastrointestinal Neoplasms / diagnosis
Gastrointestinal Neoplasms / veterinary*
Inflammatory Bowel Diseases / diagnosis
Inflammatory Bowel Diseases / veterinary
Lymphoma / diagnosis
Lymphoma / veterinary
Machine Learning
Male
Neural Networks, Computer