Methods to increase reproducibility in differential gene expression via meta-analysis

Nucleic Acids Res. 2017 Jan 9;45(1):e1. doi: 10.1093/nar/gkw797. Epub 2016 Sep 14.

Abstract

Findings from clinical and biological studies are often not reproducible when tested in independent cohorts. Due to the testing of a large number of hypotheses and relatively small sample sizes, results from whole-genome expression studies in particular are often not reproducible. Compared to single-study analysis, gene expression meta-analysis can improve reproducibility by integrating data from multiple studies. However, there are multiple choices in designing and carrying out a meta-analysis. Yet, clear guidelines on best practices are scarce. Here, we hypothesized that studying subsets of very large meta-analyses would allow for systematic identification of best practices to improve reproducibility. We therefore constructed three very large gene expression meta-analyses from clinical samples, and then examined meta-analyses of subsets of the datasets (all combinations of datasets with up to N/2 samples and K/2 datasets) compared to a 'silver standard' of differentially expressed genes found in the entire cohort. We tested three random-effects meta-analysis models using this procedure. We showed relatively greater reproducibility with more-stringent effect size thresholds with relaxed significance thresholds; relatively lower reproducibility when imposing extraneous constraints on residual heterogeneity; and an underestimation of actual false positive rate by Benjamini-Hochberg correction. In addition, multivariate regression showed that the accuracy of a meta-analysis increased significantly with more included datasets even when controlling for sample size.

MeSH terms

  • Adenocarcinoma / genetics
  • Adenocarcinoma / pathology
  • Adenocarcinoma of Lung
  • Cardiomyopathies / genetics
  • Cardiomyopathies / pathology
  • Cohort Studies
  • Datasets as Topic
  • Gene Expression Profiling
  • Gene Expression Regulation*
  • Genetic Heterogeneity
  • Genome, Human*
  • Genome-Wide Association Study / statistics & numerical data*
  • Graft Rejection / genetics
  • Graft Rejection / pathology
  • Guidelines as Topic
  • Humans
  • Influenza, Human / genetics
  • Influenza, Human / pathology
  • Kidney Transplantation
  • Lung Neoplasms / genetics
  • Lung Neoplasms / pathology
  • Meta-Analysis as Topic*
  • Models, Statistical*
  • Reproducibility of Results
  • Sample Size
  • Sepsis / genetics
  • Sepsis / pathology
  • Tuberculosis, Pulmonary / genetics
  • Tuberculosis, Pulmonary / pathology