The availability of the complete sequence of the human genome has dramatically facilitated the search for disease-causing sequence variations. In fact, the rate-limiting step has shifted from the discovery and characterization of candidate genes to the actual screening of human populations and the subsequent interpretation of observed variations. In this study we tested the hypothesis that some segments of candidate genes are more likely than others to contain disease-causing variations and that these segments can be predicted bioinformatically. A bioinformatic technique, prioritization of annotated regions (PAR), was developed to predict the likelihood that a specific coding region of a gene will harbor a disease-causing mutation based on conserved protein functional domains and protein secondary structures. This method was evaluated by using it to analyze 710 genes that collectively harbor 4,498 previously identified mutations. Nearly 50% of the genes were recognized as disease-associated after screening only 9% of the complete coding sequence. The PAR technique identified 90% of the genes as containing at least one mutation, with less than 40% of the screening resources that traditional approaches would require. These results suggest that prioritization strategies such as PAR can accelerate disease-gene identification through more efficient use of screening resources.
2006 Wiley-Liss, Inc.