Ensembl comparative genomics resources

Javier Herrero; Matthieu Muffato; Kathryn Beal; Stephen Fitzgerald; Leo Gordon; Miguel Pignatelli; Albert J Vilella; Stephen M J Searle; Ridwan Amode; Simon Brent; William Spooner; Eugene Kulesha; Andrew Yates; Paul Flicek

doi:10.1093/database/bav096

Ensembl comparative genomics resources

Database (Oxford). 2016 Feb 20:2016:bav096. doi: 10.1093/database/bav096. Print 2016.

Authors

Affiliations

¹ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, Bill Lyons Informatics Centre, UCL Cancer Institute, University College London, London WC1E 6DD, [email protected] [email protected] [email protected].
² European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD.
³ Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA.
⁴ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA.
⁵ Eagle Genomics Ltd., Babraham Research Campus, Cambridge, CB22 3AT, UK, and Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA.
⁶ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, [email protected].

Abstract

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Computational Biology / methods*
DNA, Complementary / genetics
Databases, Genetic
Evolution, Molecular
Expressed Sequence Tags
Genome*
Genomics*
Humans
Phylogeny
Quality Control
RNA, Untranslated / genetics
Sequence Alignment
Sequence Analysis, RNA
Software

Substances

DNA, Complementary
RNA, Untranslated

Abstract

Publication types

MeSH terms

Substances

Grants and funding