B2G-FAR, a species-centered GO annotation repository

Bioinformatics. 2011 Apr 1;27(7):919-24. doi: 10.1093/bioinformatics/btr059. Epub 2011 Feb 18.

Abstract

Motivation: Functional genomics research has expanded enormously in the last decade thanks to the cost reduction in high-throughput technologies and the development of computational tools that generate, standardize and share information on gene and protein function such as the Gene Ontology (GO). Nevertheless, many biologists, especially working with non-model organisms, still suffer from non-existing or low-coverage functional annotation, or simply struggle retrieving, summarizing and querying these data.

Results: The Blast2GO Functional Annotation Repository (B2G-FAR) is a bioinformatics resource envisaged to provide functional information for otherwise uncharacterized sequence data and offers data mining tools to analyze a larger repertoire of species than currently available. This new annotation resource has been created by applying the Blast2GO functional annotation engine in a strongly high-throughput manner to the entire space of public available sequences. The resulting repository contains GO term predictions for over 13.2 million non-redundant protein sequences based on BLAST search alignments from the SIMAP database. We generated GO annotation for approximately 150 000 different taxa making available 2000 species with the highest coverage through B2G-FAR. A second section within B2G-FAR holds functional annotations for 17 non-model organism Affymetrix GeneChips.

Conclusions: B2G-FAR provides easy access to exhaustive functional annotation for 2000 species offering a good balance between quality and quantity, thereby supporting functional genomics research especially in the case of non-model organisms.

Availability: The annotation resource is available at http://www.b2gfar.org.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining
  • Databases, Genetic
  • Genes
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing
  • Molecular Sequence Annotation*
  • Oligonucleotide Array Sequence Analysis
  • Sequence Analysis, Protein
  • Software*
  • Vocabulary, Controlled