Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker

Hum Mutat. 2008 Jan;29(1):6-13. doi: 10.1002/humu.20654.

Abstract

Unambiguous and correct sequence variant descriptions are of utmost importance, not in the least since mistakes and uncertainties may lead to undesired errors in clinical diagnosis. We developed the Mutation Analyzer (Mutalyzer) sequence variation nomenclature checker (www.lovd.nl/mutalyzer; last accessed 13 September 2007) for automated analysis and correction of sequence variant descriptions using reference sequences from any organism. Mutalyzer handles most variation types: substitution, deletion, duplication, insertion, indel, and splice-site changes following current recommendations of the Human Genome Variation Society (HGVS). Input is a GenBank accession number or an uploaded reference sequence file in GenBank format with user-modified annotation, an HGNC gene symbol, and the variant (single or in a batch file). Mutalyzer generates variant descriptions at DNA level, the level of all annotated transcripts and the deduced outcome at protein level. To validate Mutalyzer's performance and to investigate the sequence variant description quality in locus-specific mutation databases (LSDBs), more than 11,000 variants in the PAH, BIC BRCA2, and HbVar databases were analyzed, showing that 87%, 25%, and 38%, respectively, were error-free and following the recommendations. Low recognition rates in BIC and HbVar (38% and 51%, respectively) were due to lack of a well-annotated genomic reference sequence (HbVar) or noncompliance to the guidelines (BRCA2). Provided with well-annotated genomic reference sequences, Mutalyzer is very effective for the curation of newly discovered sequence variation descriptions and existing LSDB data. Mutalyzer will be linked to the Leiden Open source Variation Database (LOVD) (www.LOVD.nl; last accessed 13 September 2007) and is the first module of a sequence variant effect prediction package.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Amino Acid Sequence
  • Base Sequence
  • Computational Biology
  • DNA Mutational Analysis*
  • Databases, Nucleic Acid*
  • Genome, Human
  • Humans
  • Molecular Sequence Data
  • Mutation*
  • Polymorphism, Single Nucleotide
  • Software*
  • Terminology as Topic