The relationship between proteome size, structural disorder and organism complexity

Genome Biol. 2011 Dec 19;12(12):R120. doi: 10.1186/gb-2011-12-12-r120.

Abstract

Background: Sequencing the genomes of the first few eukaryotes created the impression that gene number shows no correlation with organism complexity, often referred to as the G-value paradox. Several attempts have previously been made to resolve this paradox, citing multifunctionality of proteins, alternative splicing, microRNAs or non-coding DNA. As intrinsic protein disorder has been linked with complex responses to environmental stimuli and communication between cells, an additional possibility is that structural disorder may effectively increase the complexity of species.

Results: We revisited the G-value paradox by analyzing many new proteomes whose complexity measured with their number of distinct cell types is known. We found that complexity and proteome size measured by the total number of amino acids correlate significantly and have a power function relationship. We systematically analyzed numerous other features in relation to complexity in several organisms and tissues and found: the fraction of protein structural disorder increases significantly between prokaryotes and eukaryotes but does not further increase over the course of evolution; the number of predicted binding sites in disordered regions in a proteome increases with complexity; the fraction of protein disorder, predicted binding sites, alternative splicing and protein-protein interactions all increase with the complexity of human tissues.

Conclusions: We conclude that complexity is a multi-parametric trait, determined by interaction potential, alternative splicing capacity, tissue-specific protein disorder and, above all, proteome size. The G-value paradox is only apparent when plants are grouped with metazoans, as they have a different relationship between complexity and proteome size.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alternative Splicing
  • Amino Acids / genetics
  • Bacteria / genetics*
  • Binding Sites
  • Biological Evolution
  • Computational Biology
  • Databases, Protein
  • Eukaryota / genetics*
  • Genome Size
  • Humans
  • Plants / genetics*
  • Protein Folding
  • Protein Interaction Maps
  • Proteome / genetics*
  • Proteostasis Deficiencies / genetics*

Substances

  • Amino Acids
  • Proteome