Correcting for batch effects in case-control microbiome studies

PLoS Comput Biol. 2018 Apr 23;14(4):e1006102. doi: 10.1371/journal.pcbi.1006102. eCollection 2018 Apr.

Abstract

High-throughput data generation platforms, like mass-spectrometry, microarrays, and second-generation sequencing are susceptible to batch effects due to run-to-run variation in reagents, equipment, protocols, or personnel. Currently, batch correction methods are not commonly applied to microbiome sequencing datasets. In this paper, we compare different batch-correction methods applied to microbiome case-control studies. We introduce a model-free normalization procedure where features (i.e. bacterial taxa) in case samples are converted to percentiles of the equivalent features in control samples within a study prior to pooling data across studies. We look at how this percentile-normalization method compares to traditional meta-analysis methods for combining independent p-values and to limma and ComBat, widely used batch-correction models developed for RNA microarray data. Overall, we show that percentile-normalization is a simple, non-parametric approach for correcting batch effects and improving sensitivity in case-control meta-analyses.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Colorectal Neoplasms / microbiology
  • Computational Biology
  • Computer Simulation
  • Data Interpretation, Statistical
  • Databases, Nucleic Acid / statistics & numerical data
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Meta-Analysis as Topic
  • Microbiota / genetics*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Statistics, Nonparametric