Mining susceptibility gene modules and disease risk genes from SNP data by combining network topological properties with support vector regression

J Theor Biol. 2011 Nov 21:289:225-36. doi: 10.1016/j.jtbi.2011.08.040. Epub 2011 Sep 5.

Abstract

Genome-wide association study is a powerful approach to identify disease risk loci. However, the molecular regulatory mechanisms for most complex diseases are still not well understood. Therefore, further investigating the interplay between genetic factors and biological networks is important for elucidating the molecular mechanisms of complex diseases. Here, we proposed a novel framework to identify susceptibility gene modules and disease risk genes by combining network topological properties with support vector regression from single nucleotide polymorphism (SNP) level. We assigned risk SNPs to genes using the University of California at Santa Cruz (UCSC) genome database, and then mapped these genes to protein-protein interaction (PPI) networks. The gene modules implicated by hub genes were extracted using the PPI networks and the topological property was analyzed for these gene modules. For each gene module, risk feature genes were determined by topological property analysis and support vector regression. As a result, five shared risk feature genes, CD80, EGFR, FN1, GSK3B and TRAF6 were found and proven to be associated with rheumatoid arthritis by previous reports. Our approach showed a good performance in comparison with other approaches and can be used for prioritizing candidate genes associated with complex diseases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arthritis, Rheumatoid / genetics
  • Chromosome Mapping / methods
  • Databases, Genetic
  • Diabetes Mellitus, Type 1 / genetics
  • Gene Regulatory Networks / genetics*
  • Genetic Predisposition to Disease / genetics*
  • Genome-Wide Association Study
  • Humans
  • Polymorphism, Single Nucleotide*
  • Protein Interaction Maps / genetics
  • Support Vector Machine