A remarkable degree of genetic variation has been found in the protein-encoding regions of DNA through deep sequencing of samples obtained from thousands of subjects from several populations. Approximately half of the 20 000 single nucleotide polymorphisms present, even in normal healthy subjects, are nonsynonymous amino acid substitutions that could potentially affect protein function. The greatest challenges currently facing investigators are data interpretation and the development of strategies to identify the few gene-coding variants that actually cause or confer susceptibility to disease. A confusing array of options is available to address this problem. Unfortunately, the overall accuracy of these tools at ultraconserved positions is low, and predictions generated by current computational tools may mislead researchers involved in downstream experimental and clinical studies. First, we have presented an updated review of these tools and their primary functionalities, focusing on those that are naturally prone to analyze massive variant sets, to infer some interesting similarities among their results. Additionally, we have evaluated the prediction congruency for real whole-exome sequencing data in a proof-of-concept study on some of these web-based tools.
Keywords: SNP classification; nonsynonymous SNP; pathological effect; whole-exome sequencing.