Gene-set analyses are used to assess whether there is any evidence of association with disease among a set of biologically related genes. Such an analysis typically treats all genes within the sets similarly, even though there is substantial, external, information concerning the likely importance of each gene within each set. For example, for traits that are under purifying selection, we would expect genes showing extensive genic constraint to be more likely to be trait associated than unconstrained genes. Here we improve gene-set analyses by incorporating such external information into a higher-criticism-based signal detection analysis. We show that when this external information is predictive of whether a gene is associated with disease, our approach can lead to a significant increase in power. Further, our approach is particularly powerful when the signal is sparse, that is when only a small number of genes within the set are associated with the trait. We illustrate our approach with a gene-set analysis of amyotrophic lateral sclerosis (ALS) and implicate a number of gene-sets containing SOD1 and NEK1 as well as showing enrichment of small p values for gene-sets containing known ALS genes. We implement our approach in the R package wHC.
Keywords: amyotrophic lateral sclerosis; gene-set-based analysis; higher criticism; prior information; weighted p values.
© 2020 Wiley Periodicals, Inc.