DANUBE: Data-driven meta-ANalysis using UnBiased Empirical distributions-applied to biological pathway analysis

Proc IEEE Inst Electr Electron Eng. 2017 Mar;105(3):496-515. doi: 10.1109/jproc.2015.2507119. Epub 2016 Mar 31.

Abstract

Identifying the pathways and mechanisms that are significantly impacted in a given phenotype is challenging. Issues include patient heterogeneity and noise. Many experiments do not have a large enough sample size to achieve the statistical power necessary to identify significantly impacted pathways. Meta-analysis based on combining p-values from individual experiments has been used to improve power. However, all classical meta-analysis approaches work under the assumption that the p-values produced by experiment-level statistical tests follow a uniform distribution under the null hypothesis. Here we show that this assumption does not hold for three mainstream pathway analysis methods, and significant bias is likely to affect many, if not all such meta-analysis studies. We introduce DANUBE, a novel and unbiased approach to combine statistics computed from individual studies. Our framework uses control samples to construct empirical null distributions, from which empirical p-values of individual studies are calculated and combined using either a Central Limit Theorem approach or the additive method. We assess the performance of DANUBE using four different pathway analysis methods. DANUBE is compared with five meta-analysis approaches, as well as with a pathway analysis approach that employs multiple datasets (MetaPath). The 25 approaches have been tested on 16 different datasets related to two human diseases, Alzheimer's disease (7 datasets) and acute myeloid leukemia (9 datasets). We demonstrate that DANUBE overcomes bias in order to consistently identify relevant pathways. We also show how the framework improves results in more general cases, compared to classical meta-analysis performed with common experiment-level statistical tests such as Wilcoxon and t-test.

Keywords: Alzheimer’s disease; acute myeloid leukemia; empirical distribution; meta-analysis; p-values; pathway analysis.