A Bayesian derived network of breast pathology co-occurrence

J Biomed Inform. 2008 Apr;41(2):242-50. doi: 10.1016/j.jbi.2007.12.005. Epub 2008 Jan 1.

Abstract

In this paper, we present the validation and verification of a machine-learning based Bayesian network of breast pathology co-occurrence. The present/not present occurrences of 29 common breast pathologies from 1631 pathology reports were used to build the network. All pathology reports were developed by a single pathologist. The resulting network has 25 diagnosis nodes interconnected by 40 arcs. Each arc represents a predicted co-occurrence or null co-occurrence. Model verification involved assessing the robustness of the original network structure after random exclusion of 25%, 50%, and 75% of the pathology report dataset. The structure of the network appears stable as random removal of 75% of the records in the original dataset leaves 81% of the original network intact. Model validation was primarily assessed by review of the breast pathology literature for each arc in the network. Almost all network identified co-occurrences (95%) have been published in the breast pathology literature or were verified by expert opinion. In conclusion, the Bayesian network of breast pathology co-occurrence presented here is both robust with respect to incomplete data and validated by consistency with the breast pathology literature and by expert opinion. Further, the ability to utilize a specific pathology observation to predict multiple co-current pathologies enables exploration of pathology co-occurrence patterns in an intuitive manner that may have broader application in both the breast pathologist clinical community and the breast cancer research community.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Bayes Theorem
  • Breast Neoplasms / classification*
  • Breast Neoplasms / diagnosis*
  • Breast Neoplasms / pathology
  • Diagnosis, Computer-Assisted / methods*
  • Female
  • Humans
  • Pattern Recognition, Automated / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity