The automatic discovery of structural principles describing protein fold space

J Mol Biol. 2003 Jul 18;330(4):839-50. doi: 10.1016/s0022-2836(03)00620-x.

Abstract

The study of protein structure has been driven largely by the careful inspection of experimental data by human experts. However, the rapid determination of protein structures from structural-genomics projects will make it increasingly difficult to analyse (and determine the principles responsible for) the distribution of proteins in fold space by inspection alone. Here, we demonstrate a machine-learning strategy that automatically determines the structural principles describing 45 folds. The rules learnt were shown to be both statistically significant and meaningful to protein experts. With the increasing emphasis on high-throughput experimental initiatives, machine-learning and other automated methods of analysis will become increasingly important for many biological problems.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Databases as Topic
  • Immunoglobulins / chemistry
  • Models, Molecular
  • Protein Folding*
  • Proteins / chemistry*
  • Software
  • src Homology Domains

Substances

  • Immunoglobulins
  • Proteins