Analysis of single-locus tests to detect gene/disease associations

Genet Epidemiol. 2005 Apr;28(3):207-19. doi: 10.1002/gepi.20050.

Abstract

A goal of association analysis is to determine whether variation in a particular candidate region or gene is associated with liability to complex disease. To evaluate such candidates, ubiquitous Single Nucleotide Polymorphisms (SNPs) are useful. It is critical, however, to select a set of SNPs that are in substantial linkage disequilibrium (LD) with all other polymorphisms in the region. Whether there is an ideal statistical framework to test such a set of 'tag SNPs' for association is unknown. Compared to tests for association based on frequencies of haplotypes, recent evidence suggests tests for association based on linear combinations of the tag SNPs (Hotelling T(2) test) are more powerful. Following this logical progression, we wondered if single-locus tests would prove generally more powerful than the regression-based tests? We answer this question by investigating four inferential procedures: the maximum of a series of test statistics corrected for multiple testing by the Bonferroni procedure, T(B), or by permutation of case-control status, T(P); a procedure that tests the maximum of a smoothed curve fitted to the series of of test statistics, T(S); and the Hotelling T(2) procedure, which we call T(R). These procedures are evaluated by simulating data like that from human populations, including realistic levels of LD and realistic effects of alleles conferring liability to disease. We find that power depends on the correlation structure of SNPs within a gene, the density of tag SNPs, and the placement of the liability allele. The clearest pattern emerges between power and the number of SNPs selected. When a large fraction of the SNPs within a gene are tested, and multiple SNPs are highly correlated with the liability allele, T(S) has better power. Using a SNP selection scheme that optimizes power but also requires a substantial number of SNPs to be genotyped (roughly 10-20 SNPs per gene), power of T(P) is generally superior to that for the other procedures, including T(R). Finally, when a SNP selection procedure that targets a minimal number of SNPs per gene is applied, the average performances of T(P) and T(R) are indistinguishable.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Computer Simulation
  • Genetic Predisposition to Disease / genetics*
  • Genotype
  • Haplotypes / genetics
  • Humans
  • Linkage Disequilibrium / genetics
  • Polymorphism, Single Nucleotide / genetics*
  • Regression Analysis
  • Software