Genome-wide association studies (GWASs) are critically dependent on detailed knowledge of the pattern of linkage disequilibrium (LD) in the human genome. GWASs generate lists of variants, usually SNPs, ranked according to the significance of their association to a trait. Downstream analyses generally focus on the gene or genes that are physically closest to these SNPs and ignore their LD profile with other SNPs. We have developed a flexible R package (LDsnpR) that efficiently assigns SNPs to genes on the basis of both their physical position and their pairwise LD with other SNPs. We used the positional-binning and LD-based-binning approaches to investigate whether including these "LD-based" SNPs would affect the interpretation of three published GWASs on bipolar affective disorder (BP) and of the imputed versions of two of these GWASs. We show how including LD can be important for interpreting and comparing GWASs. In the published, unimputed GWASs, LD-based binning effectively "recovered" 6.1%-8.3% of Ensembl-defined genes. It altered the ranks of the genes and resulted in nonnegligible differences between the lists of the top 2,000 genes emerging from the two binning approaches. It also improved the overall gene-based concordance between independent BP studies. In the imputed datasets, although the increases in coverage (>0.4%) and rank changes were more modest, even greater concordance between the studies was observed, attesting to the potential of LD-based binning on imputed data as well. Thus, ignoring LD can result in the misinterpretation of the GWAS findings and have an impact on subsequent genetic and functional studies.
Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.