DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

Donghyung Lee; T Bernard Bigdeli; Vernell S Williamson; Vladimir I Vladimirov; Brien P Riley; Ayman H Fanous; Silviu-Alin Bacanu

doi:10.1093/bioinformatics/btv348

DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts

Bioinformatics. 2015 Oct 1;31(19):3099-104. doi: 10.1093/bioinformatics/btv348. Epub 2015 Jun 9.

Authors

Donghyung Lee¹, T Bernard Bigdeli², Vernell S Williamson², Vladimir I Vladimirov³, Brien P Riley², Ayman H Fanous², Silviu-Alin Bacanu²

Affiliations

¹ Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA, Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA.
² Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA.
³ Department of Psychiatry, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA, Center for Biomarker Research & Personalized Medicine, Virginia Commonwealth University, Richmond, VA 23298, USA and Lieber Institute for Brain Development, Johns Hopkins University, Baltimore, MD 21205, USA.

Abstract

Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts.

Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources.

Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix.

Contact: [email protected]

Supplementary information: Supplementary Data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Cohort Studies
Computational Biology / methods*
Computer Simulation
Databases, Genetic
Ethnicity / genetics*
Genome-Wide Association Study
Humans
Polymorphism, Single Nucleotide / genetics*
Software*
Statistics as Topic*

Abstract

Publication types

MeSH terms

Grants and funding