Selecting SNPs informative for African, American Indian and European Ancestry: application to the Family Investigation of Nephropathy and Diabetes (FIND)

Robert C Williams; Robert C Elston; Pankaj Kumar; William C Knowler; Hanna E Abboud; Sharon Adler; Donald W Bowden; Jasmin Divers; Barry I Freedman; Robert P Igo Jr; Eli Ipp; Sudha K Iyengar; Paul L Kimmel; Michael J Klag; Orly Kohn; Carl D Langefeld; David J Leehey; Robert G Nelson; Susanne B Nicholas; Madeleine V Pahl; Rulan S Parekh; Jerome I Rotter; Jeffrey R Schelling; John R Sedor; Vallabh O Shah; Michael W Smith; Kent D Taylor; Farook Thameem; Denyse Thornley-Brown; Cheryl A Winkler; Xiuqing Guo; Phillip Zager; Robert L Hanson; FIND Research Group

doi:10.1186/s12864-016-2654-x

Selecting SNPs informative for African, American Indian and European Ancestry: application to the Family Investigation of Nephropathy and Diabetes (FIND)

BMC Genomics. 2016 May 4:17:325. doi: 10.1186/s12864-016-2654-x.

Authors

Robert C Williams¹, Robert C Elston², Pankaj Kumar³, William C Knowler³, Hanna E Abboud⁴, Sharon Adler⁵, Donald W Bowden⁶, Jasmin Divers⁶, Barry I Freedman⁶, Robert P Igo Jr², Eli Ipp⁵, Sudha K Iyengar², Paul L Kimmel⁷, Michael J Klag⁸, Orly Kohn⁹, Carl D Langefeld⁶, David J Leehey¹⁰, Robert G Nelson³, Susanne B Nicholas¹¹, Madeleine V Pahl¹², Rulan S Parekh¹³, Jerome I Rotter¹⁴, Jeffrey R Schelling¹⁵, John R Sedor¹⁵, Vallabh O Shah¹⁶, Michael W Smith¹⁷, Kent D Taylor¹⁴, Farook Thameem^{4

18}, Denyse Thornley-Brown¹⁹, Cheryl A Winkler²⁰, Xiuqing Guo¹⁴, Phillip Zager¹⁶, Robert L Hanson³; FIND Research Group

Affiliations

¹ Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ, 85014, USA. [email protected].
² Genetic Analysis and Data Coordinating Center, Case Western Reserve University, Cleveland, OH, 44104, USA.
³ Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Phoenix, AZ, 85014, USA.
⁴ Division of Nephrology, The University of Texas Health Science Center, San Antonio, TX, 78229, USA.
⁵ Department of Nephrology, Harbor-UCLA Medical Center, Torrance, CA, 90502, USA.
⁶ Wake Forest School of Medicine, Winston-Salem, NC, 27157, USA.
⁷ National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 20892, USA.
⁸ Welch Center for Prevention, Epidemiology, and Clinical Research, Baltimore, MD, 21205, USA.
⁹ The University of Chicago Medical Center, Chicago, IL, 60637, USA.
¹⁰ Loyola University Medical Center, Chicago, IL, 60153, USA.
¹¹ Divisions of Nephrology and Endocrinology, David Geffen School of Medicine at UCLA, Los Angeles, CA, 90095, USA.
¹² Division of Nephrology and Hypertension, Department of Medicine, UC Irvine School of Medicine, University of California, Orange, 92868, CA, USA.
¹³ Hospital for Sick Children, University Health Network and the University of Toronto, Ontario, M5G1X8, Canada.
¹⁴ Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute and Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA, 90502, USA.
¹⁵ Departments of Medicine and Physiology and Biophysics, Case Western Reserve University, Cleveland, OH, 44104, USA.
¹⁶ The University of New Mexico, Albuquerque, NM, 87131, USA.
¹⁷ National Human Genome Research Institute, NIH, Bethesda, MD, 20892, USA.
¹⁸ Department of Biochemistry, Faculty of Medicine, Kuwait University, Kuwait City, Kuwait.
¹⁹ The University of Alabama at Birmingham, Birmingham, AL, 35233, USA.
²⁰ Center for Cancer Research, National Cancer Institute, NIH, Leidos Biomedical, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.

Abstract

Background: The presence of population structure in a sample may confound the search for important genetic loci associated with disease. Our four samples in the Family Investigation of Nephropathy and Diabetes (FIND), European Americans, Mexican Americans, African Americans, and American Indians are part of a genome- wide association study in which population structure might be particularly important. We therefore decided to study in detail one component of this, individual genetic ancestry (IGA). From SNPs present on the Affymetrix 6.0 Human SNP array, we identified 3 sets of ancestry informative markers (AIMs), each maximized for the information in one the three contrasts among ancestral populations: Europeans (HAPMAP, CEU), Africans (HAPMAP, YRI and LWK), and Native Americans (full heritage Pima Indians). We estimate IGA and present an algorithm for their standard errors, compare IGA to principal components, emphasize the importance of balancing information in the ancestry informative markers (AIMs), and test the association of IGA with diabetic nephropathy in the combined sample.

Results: A fixed parental allele maximum likelihood algorithm was applied to the FIND to estimate IGA in four samples: 869 American Indians; 1385 African Americans; 1451 Mexican Americans; and 826 European Americans. When the information in the AIMs is unbalanced, the estimates are incorrect with large error. Individual genetic admixture is highly correlated with principle components for capturing population structure. It takes ~700 SNPs to reduce the average standard error of individual admixture below 0.01. When the samples are combined, the resulting population structure creates associations between IGA and diabetic nephropathy.

Conclusions: The identified set of AIMs, which include American Indian parental allele frequencies, may be particularly useful for estimating genetic admixture in populations from the Americas. Failure to balance information in maximum likelihood, poly-ancestry models creates biased estimates of individual admixture with large error. This also occurs when estimating IGA using the Bayesian clustering method as implemented in the program STRUCTURE. Odds ratios for the associations of IGA with disease are consistent with what is known about the incidence and prevalence of diabetic nephropathy in these populations.

Keywords: Diabetic nephropathy; Individual genetic ancestry; Population structure; SNP.

MeSH terms

Algorithms
Black or African American / genetics*
Chromosome Mapping
Diabetic Nephropathies / ethnology
Diabetic Nephropathies / genetics*
Genetic Markers / genetics
Genetic Predisposition to Disease
Genome-Wide Association Study / methods
Humans
Indians, North American / genetics*
Likelihood Functions
Mexican Americans / genetics*
Models, Genetic
Oligonucleotide Array Sequence Analysis / methods
Polymorphism, Single Nucleotide*
Principal Component Analysis
United States / ethnology
White People / genetics*

Substances

Genetic Markers

Abstract

MeSH terms

Substances

Grants and funding