Status quo of annotation of human disease variants

BMC Bioinformatics. 2013 Dec 4:14:352. doi: 10.1186/1471-2105-14-352.

Abstract

Background: The ever on-going technical developments in Next Generation Sequencing have led to an increase in detected disease related mutations. Many bioinformatics approaches exist to analyse these variants, and of those the methods that use 3D structure information generally outperform those that do not use this information. 3D structure information today is available for about twenty percent of the human exome, and homology modelling can double that fraction. This percentage is rapidly increasing so that we can expect to analyse the majority of all human exome variants in the near future using protein structure information.

Results: We collected a test dataset of well-described mutations in proteins for which 3D-structure information is available. This test dataset was used to analyse the possibilities and the limitations of methods based on sequence information alone, hybrid methods, machine learning based methods, and structure based methods.

Conclusions: Our analysis shows that the use of structural features improves the classification of mutations. This study suggests strategies for future analyses of disease causing mutations, and it suggests which bioinformatics approaches should be developed to make progress in this field.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Cluster Analysis
  • Computational Biology / methods*
  • Conserved Sequence / genetics
  • Databases, Genetic
  • Exome / genetics
  • Genetic Variation*
  • Genome, Human / genetics
  • High-Throughput Nucleotide Sequencing / methods
  • High-Throughput Nucleotide Sequencing / trends
  • Humans
  • Molecular Sequence Annotation / methods*
  • Mutation / genetics
  • Polymorphism, Single Nucleotide / genetics
  • Proteins / chemistry
  • Proteins / genetics*
  • Sequence Alignment / trends
  • Sequence Homology, Amino Acid

Substances

  • Proteins