reGenotyper: Detecting mislabeled samples in genetic data

PLoS One. 2017 Feb 13;12(2):e0171324. doi: 10.1371/journal.pone.0171324. eCollection 2017.

Abstract

In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the "ideal" genotype and identify "best-matched" labels for mislabeled samples. On average, we identified 4% of samples as mislabeled in eight published datasets, highlighting the necessity of applying a "data cleaning" step before standard data analysis.

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology / methods*
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Genomics / methods
  • Genotype
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait Loci / genetics*
  • Reproducibility of Results

Grants and funding

This work was supported by the 7th Framework Programme of the European Commission under the Research Project PANACEA [Contract No. 222936 to RCJ and JEK]; the Netherlands Organisation for Scientific Research (NWO) VENI grant [n° 863.13.011 to YL]; and the Dutch Carbohydrate Competence Center, which is is co-financed by the European Regional Development Fund, the Dutch Ministry of Economic Affairs (as part of Pieken in de Delta, the government’s regional economic agenda), the Municipality of Groningen and the Province of Groningen [CCC WP23 to KZ] The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.