Rich annotation of DNA sequencing variants by leveraging the Ensembl Variant Effect Predictor with plugins

Brief Bioinform. 2015 Mar;16(2):255-64. doi: 10.1093/bib/bbu008. Epub 2014 Mar 12.

Abstract

High-throughput DNA sequencing has become a mainstay for the discovery of genomic variants that may cause disease or affect phenotype. A next-generation sequencing pipeline typically identifies thousands of variants in each sample. A particular challenge is the annotation of each variant in a way that is useful to downstream consumers of the data, such as clinical sequencing centers or researchers. These users may require that all data storage and analysis remain on secure local servers to protect patient confidentiality or intellectual property, may have unique and changing needs to draw on a variety of annotation data sets and may prefer not to rely on closed-source applications beyond their control. Here we describe scalable methods for using the plugin capability of the Ensembl Variant Effect Predictor to enrich its basic set of variant annotations with additional data on genes, function, conservation, expression, diseases, pathways and protein structure, and describe an extensible framework for easily adding additional custom data sets.

Keywords: DNA sequencing; Ensembl Variant Effect Predictor; annotation; database; plugin.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Databases, Nucleic Acid / statistics & numerical data
  • Genetic Variation
  • High-Throughput Nucleotide Sequencing / statistics & numerical data*
  • Humans
  • Molecular Sequence Annotation / statistics & numerical data*
  • Sequence Analysis, DNA / statistics & numerical data*
  • Software