Gene-based rare variant association studies (RVASs) have low power due to the infrequency of rare variants and the large multiple testing burden. To correct for multiple testing, traditional false discovery rate (FDR) procedures which depend solely on P-values are often used. Recently, Independent Hypothesis Weighting (IHW) was developed to improve the detection power while maintaining FDR control by leveraging prior information for each hypothesis. Here, we present a framework to increase power of gene-based RVASs by incorporating prior information using IHW. We first build supervised machine learning models to assign each gene a prediction score that measures its disease risk, using the input of multiple biological features, fed with high-confidence risk genes and local background genes selected near GWAS significant loci as the training set. Then we use the prediction scores as covariates to prioritize RVAS results via IHW. We demonstrate the effectiveness of this framework through applications to RVASs in schizophrenia and autism spectrum disorder. We found sizeable improvements in the number of significant associations compared to traditional FDR approaches, and independent evidence supporting the relevance of the genes identified by our framework but not traditional FDR, demonstrating the potential of our framework to improve power of gene-based RVASs.
Keywords: GWAS; false discovery rate; gene prioritization; machine learning; multiple hypothesis testing; neuropsychiatric disorders; rare variant association study.