Learning pathogenic proteins across fractured and heterogeneous data

AMIA Annu Symp Proc. 2008 Nov 6:889.

Abstract

In the following work, we test a generalized approach to integrating, transforming and learning data from disparate data sources for the classification of bacterial proteins involved in pathogenesis. We rely on the implicit inter-linkages between biological databases to draw relevant records, and leverage statistical learning methods to infer classification based on abundant, albeit noisy, data. Results suggest that types of public biological information have varying degrees of effectiveness in predictive data mining.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Bacterial Proteins / classification*
  • Bacterial Toxins / classification*
  • Databases, Protein*
  • Information Storage and Retrieval / methods
  • Natural Language Processing
  • Pattern Recognition, Automated / methods*
  • Terminology as Topic*
  • Virulence Factors / classification*

Substances

  • Bacterial Proteins
  • Bacterial Toxins
  • Virulence Factors