Quantification of population structure using correlated SNPs by shrinkage principal components

Fei Zou; Seunggeun Lee; Michael R Knowles; Fred A Wright

doi:10.1159/000288706

Quantification of population structure using correlated SNPs by shrinkage principal components

Hum Hered. 2010;70(1):9-22. doi: 10.1159/000288706. Epub 2010 Apr 23.

Authors

Fei Zou¹, Seunggeun Lee, Michael R Knowles, Fred A Wright

Affiliation

¹ Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. fzou @ bios.unc.edu

Abstract

Background/aims: Association studies using unrelated individuals have become the most popular design for mapping complex traits. One of the major challenges of association mapping is avoiding spurious association due to population stratification. Principal component analysis (PCA) on genome-wide marker genotypes is one of the most popular population stratification control methods. It implicitly assumes that the markers are in linkage equilibrium, a condition that is rarely satisfied and that we plan to relax.

Methods: We carefully examined the impact of linkage disequilibrium (LD) on PCA, and proposed a simple modification of the standard PCA to automatically adjust for the correlations among markers.

Results: We demonstrated that LD patterns in genome-wide association datasets can distort the techniques for stratification control, showing 'subpopulations' reflecting localized LD phenomena rather than plausible population structure. We showed that the proposed method effectively removes the artifactual effect of LD patterns, and successfully recovers underlying population structure that is not apparent from standard PCA.

Conclusion: PCA is highly influenced by sets of SNPs with high LD, obscuring the true population substructure. Our shrinkage PCA applies to all available markers, regardless of the LD patterns. The proposed method is easier to implement than most existing LD adjusted PCA methods.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Genetic Markers
Humans
Polymorphism, Single Nucleotide*
Principal Component Analysis

Substances

Genetic Markers

Abstract

Publication types

MeSH terms

Substances

Grants and funding