Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census

Proteins. 1998 Dec 1;33(4):518-34. doi: 10.1002/(sici)1097-0134(19981201)33:4<518::aid-prot5>3.0.co;2-j.

Abstract

Eight microbial genomes are compared in terms of protein structure. Specifically, yeast, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, M. pneumoniae, H. pylori, and E. coli are compared in terms of patterns of fold usage-whether a given fold occurs in a particular organism. Of the approximately 340 soluble protein folds currently in the structure databank (PDB), 240 occur in at least one of the eight genomes, and 30 are shared amongst all eight. The shared folds are depleted in allhelical structure and enriched in mixed helix-sheet structure compared to the folds in the PDB. The top-10 most common of the shared 30 are enriched in superfolds, uniting many non-homologous sequence families, and are especially similar in overall architecture-eight having helices packed onto a central sheet. They are also very different from the common folds in the PBD, highlighting databank biases. Folds can be ranked in terms of expression as well as genome duplication. In yeast the top-10 most highly expressed folds are considerably different from the most highly duplicated folds. A tree can be constructed grouping genomes in terms of their shared folds. This has a remarkably similar topology to more conventional classifications, based on very different measures of relatedness. Finally, folds of membrane proteins can be analyzed through transmembrane-helix (TM) prediction. All the genomes appear to have similar usage patterns for these folds, with the occurrence of a particular fold falling off rapidly with increasing numbers of TM-elements, according to a "Zipf-like" law. This implies there are no marked preferences for proteins with particular numbers of TM-helices (e.g. 7-TM) in microbial genomes.

Publication types

  • Comparative Study

MeSH terms

  • Archaea / genetics
  • Cyanobacteria / chemistry
  • Databases, Factual
  • Escherichia coli / chemistry
  • Genome, Bacterial*
  • Genome, Fungal*
  • Haemophilus influenzae / chemistry
  • Helicobacter pylori / chemistry
  • Membrane Proteins / chemistry
  • Methanococcus / chemistry
  • Models, Molecular
  • Models, Statistical
  • Mycoplasma / chemistry
  • Mycoplasma pneumoniae / chemistry
  • Protein Folding*
  • Saccharomyces cerevisiae / chemistry
  • Sequence Homology, Amino Acid
  • Sequence Homology, Nucleic Acid

Substances

  • Membrane Proteins