The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies

Nucleic Acids Res. 2009 Jan;37(Database issue):D310-4. doi: 10.1093/nar/gkn877. Epub 2008 Nov 7.

Abstract

The latest version of CATH (class, architecture, topology, homology) (version 3.2), released in July 2008 (http://www.cathdb.info), contains 114,215 domains, 2178 Homologous superfamilies and 1110 fold groups. We have assigned 20,330 new domains, 87 new homologous superfamilies and 26 new folds since CATH release version 3.1. A total of 28,064 new domains have been assigned since our NAR 2007 database publication (CATH version 3.0). The CATH website has been completely redesigned and includes more comprehensive documentation. We have revisited the CATH architecture level as part of the development of a 'Protein Chart' and present information on the population of each architecture. The CATHEDRAL structure comparison algorithm has been improved and used to characterize structural diversity in CATH superfamilies and structural overlaps between superfamilies. Although the majority of superfamilies in CATH are not structurally diverse and do not overlap significantly with other superfamilies, approximately 4% of superfamilies are very diverse and these are the superfamilies that are most highly populated in both the PDB and in the genomes. Information on the degree of structural diversity in each superfamily and structural overlaps between superfamilies can now be downloaded from the CATH website.

MeSH terms

  • Databases, Protein*
  • Models, Molecular
  • Protein Folding
  • Protein Structure, Secondary
  • Protein Structure, Tertiary*
  • Proteins / classification
  • Sequence Homology, Amino Acid

Substances

  • Proteins