The role SWISS-PROT and TrEMBL play in the genome research environment

V Junker; S Contrino; W Fleischmann; H Hermjakob; F Lang; M Magrane; M J Martin; N Mitaritonna; C O'Donovan; R Apweiler

doi:10.1016/s0168-1656(00)00198-x

The role SWISS-PROT and TrEMBL play in the genome research environment

J Biotechnol. 2000 Mar 31;78(3):221-34. doi: 10.1016/s0168-1656(00)00198-x.

Authors

V Junker¹, S Contrino, W Fleischmann, H Hermjakob, F Lang, M Magrane, M J Martin, N Mitaritonna, C O'Donovan, R Apweiler

Affiliation

¹ EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK. [email protected]

PMID: 10751683
DOI: 10.1016/s0168-1656(00)00198-x

Abstract

SWISS-PROT, a curated protein sequence data bank, contains not only sequence data but also annotation relevant to a particular sequence. The annotation added to each entry is done by a team of biologists and comes, primarily, from articles in journals reporting the actual sequencing and sometimes characterisation. Review articles and collaboration with external experts also play a role along with the use of secondary databases like PROSITE and Pfam in addition to a variety of feature prediction methods. Annotation added by these methods is checked for relevance and likelihood to a particular sequence. The onset of genome sequencing has led to a dramatic increase in sequence data to be included in SWISS-PROT. This has led to the production of TrEMBL (Translation of the EMBL database). TrEMBL consists of entries in a SWISS-PROT format that are derived from the translation of all coding sequences in the EMBL nucleotide sequence database, that are not in SWISS-PROT. Unlike SWISS-PROT entries those in TrEMBL are awaiting manual annotation. However, rather than just representing basic sequence and source information, steps have been taken to add features and annotation automatically. In taking these steps it is hoped that TrEMBL entries are enhanced with some indication as to what a protein is, could or may be.

MeSH terms

Amino Acid Sequence
Animals
Biotechnology
Databases, Factual*
Genome*
Humans
Molecular Sequence Data
Proteins / genetics*
Research
Sequence Alignment

Substances

Proteins