Genotyping and inflated type I error rate in genome-wide association case/control studies

BMC Bioinformatics. 2009 Feb 23:10:68. doi: 10.1186/1471-2105-10-68.

Abstract

Background: One common goal of a case/control genome wide association study (GWAS) is to find SNPs associated with a disease. Traditionally, the first step in such studies is to assign a genotype to each SNP in each subject, based on a statistic summarizing fluorescence measurements. When the distributions of the summary statistics are not well separated by genotype, the act of genotype assignment can lead to more potential problems than acknowledged by the literature.

Results: Specifically, we show that the proportions of each called genotype need not equal the true proportions in the population, even as the number of subjects grows infinitely large. The called genotypes for two subjects need not be independent, even when their true genotypes are independent. Consequently, p-values from tests of association can be anti-conservative, even when the distributions of the summary statistic for the cases and controls are identical. To address these problems, we propose two new tests designed to reduce the inflation in the type I error rate caused by these problems. The first algorithm, logiCALL, measures call quality by fully exploring the likelihood profile of intensity measurements, and the second algorithm avoids genotyping by using a likelihood ratio statistic.

Conclusion: Genotyping can introduce avoidable false positives in GWAS.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Case-Control Studies
  • Computational Biology / methods*
  • False Positive Reactions
  • Genome-Wide Association Study*
  • Genotype*
  • Humans
  • Polymorphism, Single Nucleotide