The National Center for Biotechnology Information's Protein Clusters Database

William Klimke; Richa Agarwala; Azat Badretdin; Slava Chetvernin; Stacy Ciufo; Boris Fedorov; Boris Kiryutin; Kathleen O'Neill; Wolfgang Resch; Sergei Resenchuk; Susan Schafer; Igor Tolstoy; Tatiana Tatusova

doi:10.1093/nar/gkn734

The National Center for Biotechnology Information's Protein Clusters Database

Nucleic Acids Res. 2009 Jan;37(Database issue):D216-23. doi: 10.1093/nar/gkn734. Epub 2008 Oct 21.

Authors

William Klimke¹, Richa Agarwala, Azat Badretdin, Slava Chetvernin, Stacy Ciufo, Boris Fedorov, Boris Kiryutin, Kathleen O'Neill, Wolfgang Resch, Sergei Resenchuk, Susan Schafer, Igor Tolstoy, Tatiana Tatusova

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA. [email protected]

Abstract

Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.

MeSH terms

Cluster Analysis
Databases, Protein*
Genomics
Proteins / chemistry
Proteins / classification*
Proteins / genetics
Sequence Homology, Amino Acid

Substances

Proteins