Background: The use of biological annotation such as genes and pathways in the analysis of gene expression data has aided the identification of genes for follow-up studies and suggested functional information to uncharacterized genes. Several studies have applied similar methods to genome wide association studies and identified a number of disease related pathways. However, many questions remain on how to best approach this problem, such as whether there is a need to obtain a score to summarize association evidence at the gene level, and whether a pathway, dominated by just a few highly significant genes, is of interest.
Methods: We evaluated the performance of two pathway-based methods (Random Set, and Binomial approximation to the hypergeometric test) based on their applications to three data sets of Crohn's disease. We consider both the disease status as a phenotype as well as the residuals after conditioning on IL23R, a known Crohn's related gene, as a phenotype.
Results: Our results show that Random Set method has the most power to identify disease related pathways. We confirm previously reported disease related pathways and provide evidence for IL-2 Receptor Beta Chain in T cell Activation and IL-9 signaling as Crohn's disease associated pathways.
Conclusions: Our results highlight the need to apply powerful gene score methods prior to pathway enrichment tests, and that controlling for genes that attain genome wide significance enable further biological insight.