A method has been developed for the prediction of proteins involved in genetic disorders. This involved combining deleterious SNP prediction with a system based on protein interactions and phenotype distances; this is the first time that deleterious SNP prediction has been used to make predictions across linkage-intervals. At each step we tested and selected the best procedure, revealing that the computationally expensive method of assigning medical meta-terms to create a phenotype distance matrix was outperformed by a simple word counting technique. We carried out in-depth benchmarking with increasingly stringent data sets, reaching precision values of up to 75% (19% recall) for 10-Mb linkage-intervals (averaging 100 genes). For the most stringent (worst-case) data we attained an overall recall of 6%, yet still achieved precision values of up to 90% (4% recall). At all levels of stringency and precision the addition of predicted deleterious SNPs was shown to increase recall.
2009 Wiley-Liss, Inc.