Comparative analysis of protein domain organization

Genome Res. 2004 Mar;14(3):343-53. doi: 10.1101/gr.1610504.

Abstract

We have developed a set of graph theory-based tools, which we call Comparative Analysis of Protein Domain Organization (CADO), to survey and compare protein domain organizations of different organisms. In the language of CADO, the organization of protein domains in a given organism is shown as a domain graph in which protein domains are represented as vertices, and domain combinations, defined as instances of two domains found in one protein, are represented as edges. CADO provides a new way to analyze and compare whole proteomes, including identifying the consensus and difference of domain organization between organisms. CADO was used to analyze and compare >50 bacterial, archaeal, and eukaryotic genomes. Examples and overviews presented here include the analysis of the modularity of domain graphs and the functional study of domains based on the graph topology. We also report on the results of comparing domain graphs of two organisms, Pyrococcus horikoshii (an extremophile) and Haemophilus influenzae (a parasite with reduced genome) with other organisms. Our comparison provides new insights into the genome organization of these organisms. Finally, we report on the specific domain combinations characterizing the three kingdoms of life, and the kingdom "signature" domain organizations derived from those specific domain combinations.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Arabidopsis Proteins / chemistry
  • Arabidopsis Proteins / genetics
  • Arabidopsis Proteins / physiology
  • Archaeal Proteins / chemistry
  • Archaeal Proteins / genetics
  • Archaeal Proteins / physiology
  • Caenorhabditis elegans Proteins / chemistry
  • Caenorhabditis elegans Proteins / genetics
  • Caenorhabditis elegans Proteins / physiology
  • Cluster Analysis
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Drosophila Proteins / chemistry
  • Drosophila Proteins / genetics
  • Genetic Heterogeneity
  • Genome
  • Genome, Archaeal
  • Genome, Bacterial
  • Genome, Fungal
  • Genome, Human
  • Genome, Plant
  • Humans
  • Protein Structure, Tertiary / genetics
  • Protein Structure, Tertiary / physiology
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / physiology
  • Proteome / chemistry
  • Proteome / genetics
  • Proteome / physiology
  • Saccharomyces cerevisiae Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / genetics
  • Saccharomyces cerevisiae Proteins / physiology

Substances

  • Arabidopsis Proteins
  • Archaeal Proteins
  • Caenorhabditis elegans Proteins
  • Drosophila Proteins
  • Proteins
  • Proteome
  • Saccharomyces cerevisiae Proteins