Analyzing illumina gene expression microarray data from different tissues: methodological aspects of data analysis in the metaxpress consortium

PLoS One. 2012;7(12):e50938. doi: 10.1371/journal.pone.0050938. Epub 2012 Dec 7.

Abstract

Microarray profiling of gene expression is widely applied in molecular biology and functional genomics. Experimental and technical variations make meta-analysis of different studies challenging. In a total of 3358 samples, all from German population-based cohorts, we investigated the effect of data preprocessing and the variability due to sample processing in whole blood cell and blood monocyte gene expression data, measured on the Illumina HumanHT-12 v3 BeadChip array.Gene expression signal intensities were similar after applying the log(2) or the variance-stabilizing transformation. In all cohorts, the first principal component (PC) explained more than 95% of the total variation. Technical factors substantially influenced signal intensity values, especially the Illumina chip assignment (33-48% of the variance), the RNA amplification batch (12-24%), the RNA isolation batch (16%), and the sample storage time, in particular the time between blood donation and RNA isolation for the whole blood cell samples (2-3%), and the time between RNA isolation and amplification for the monocyte samples (2%). White blood cell composition parameters were the strongest biological factors influencing the expression signal intensities in the whole blood cell samples (3%), followed by sex (1-2%) in both sample types. Known single nucleotide polymorphisms (SNPs) were located in 38% of the analyzed probe sequences and 4% of them included common SNPs (minor allele frequency >5%). Out of the tested SNPs, 1.4% significantly modified the probe-specific expression signals (Bonferroni corrected p-value<0.05), but in almost half of these events the signal intensities were even increased despite the occurrence of the mismatch. Thus, the vast majority of SNPs within probes had no significant effect on hybridization efficiency.In summary, adjustment for a few selected technical factors greatly improved reliability of gene expression analyses. Such adjustments are particularly required for meta-analyses.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling / methods*
  • Gene Expression*
  • Germany
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results

Grants and funding

SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the BMBF (German Ministry of Education and Research, http://www.bmbf.de), the Ministry of Cultural Affairs (http://www.regierung-mv.de/cms2/Regierungsportal_prod/Regierungsportal/de/bm/) as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania (http://www.regierung-mv.de/cms2/Regierungsportal_prod/Regierungsportal/de/sm/). Analyses were supported by the “Greifswald Approach to Individualized Medicine (GANI_MED, http://www.gani-med.de/)” consortium funded by the BMBF (grant 03IS2061A). Genome-wide genotyping and expression data have been supported by the BMBF (grant no. 03ZIK012) and a joint grant from Siemens Healthcare, Erlangen, Germany (http://www.siemens.com/) and the Federal State of Mecklenburg, West Pomerania (http://www.regierung-mv.de/). The University of Greifswald is a member of the ‘Center of Knowledge Interchange’ program of the Siemens AG and the Caché Campus program of the InterSystems GmbH (http://www.intersystems.com). The KORA research platform and the KORA Augsburg studies are financed by the Helmholtz Zentrum München, German Research Center for Environmental Health (http://www.helmholtz-muenchen.de/), which is funded by the BMBF and by the State of Bavaria (http://www.bayern.de/). The German Diabetes Center is funded by the German Federal Ministry of Health (http://www.bmg.bund.de/) and the Ministry of School, Science and Research of the State of North-Rhine-Westphalia (http://www.innovation.nrw.de/). The Diabetes Cohort Study was funded by a German Research Foundation (http://www.dfg.de) project grant to W.R. (DFG; RA 459/2-1). This study was supported in part by a grant from the BMBF to the German Center for Diabetes Research (DZD e.V., http://www.dzd-ev.de/). This work was supported by the BMBF funded Systems Biology of Metabotypes grant (SysMBo#0315494A). Additional support was obtained from the BMBF (National Genome Research Network NGFNplus Atherogenomics, 01GS0834) and the Leibniz Association (http://www.wgl.de/) (WGL Pakt für Forschung und Innovation). The Gutenberg Health Study is funded through the government of Rheinland-Pfalz (http://www.rlp.de/) (“Stiftung Rheinland Pfalz für Innovation”, contract AZ 961–386261/733), the research programs “Wissen schafft Zukunft” and “Schwerpunkt Vaskuläre Prävention” of the Johannes Gutenberg-University of Mainz (http://www.uni-mainz.de/), and its contract with Boehringer Ingelheim (http://www.boehringer-ingelheim.de/) and PHILIPS Medical Systems (http://www.healthcare.philips.com/), including an unrestricted grant for the Gutenberg Health Study. Specifically, the research reported in this article was supported by the National Genome Network “NGFNplus” (http://www.ngfn.de/en/start.html) (contract 01GS0833 and 01GS0831) by the BMBF, and a joint funding grant from the BMBF, and the Agence Nationale de la Recherche, France (http://www.agence-nationale-recherche.fr/) (contract BMBF 01KU0908A and ANR 09 GENO 106 01). This work was supported in part by the European Union (http://europa.eu/) (HEALTH-2011-278913), the BMBF (grants 01KU0908A, 01KU0908B, 0315536F), and supported by the DZHK (Deutsches Zentrum für Herz-Kreislauf-Forschung – German Centre for Cardiovascular Research, http://www.bmbf.de/de/16542.php, http://www.dzhk.de) and by the BMBF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.