Comparing the efficacy of SNP filtering methods for identifying a single causal SNP in a known association region

Ann Hum Genet. 2014 Jan;78(1):50-61. doi: 10.1111/ahg.12043. Epub 2013 Nov 11.

Abstract

Genome-wide association studies have successfully identified associations between common diseases and a large number of single nucleotide polymorphisms (SNPs) across the genome. We investigate the effectiveness of several statistics, including p-values, likelihoods, genetic map distance and linkage disequilibrium between SNPs, in filtering SNPs in several disease-associated regions. We use simulated data to compare the efficacy of filters with different sample sizes and for causal SNPs with different minor allele frequencies (MAFs) and effect sizes, focusing on the small effect sizes and MAFs likely to represent the majority of unidentified causal SNPs. In our analyses, of all the methods investigated, filtering on the ranked likelihoods consistently retains the true causal SNP with the highest probability for a given false positive rate. This was the case for all the local linkage disequilibrium patterns investigated. Our results indicate that when using this method to retain only the top 5% of SNPs, even a causal SNP with an odds ratio of 1.1 and MAF of 0.08 can be retained with a probability exceeding 0.9 using an overall sample size of 50,000.

Keywords: Fine-mapping; LD; causal variants; complex disease; likelihood; p-value; single nucleotide polymorphism.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Databases, Factual
  • Gene Frequency
  • Genome
  • Genome-Wide Association Study / methods*
  • Genotyping Techniques / methods
  • Humans
  • Linkage Disequilibrium
  • Logistic Models
  • Models, Genetic
  • Odds Ratio
  • Polymorphism, Single Nucleotide*
  • ROC Curve
  • Sample Size