Methods for collapsing multiple rare variants in whole-genome sequence data

Yun Ju Sung; Keegan D Korthauer; Michael D Swartz; Corinne D Engelman

doi:10.1002/gepi.21820

Methods for collapsing multiple rare variants in whole-genome sequence data

Genet Epidemiol. 2014 Sep;38 Suppl 1(0 1):S13-20. doi: 10.1002/gepi.21820.

Authors

Yun Ju Sung¹, Keegan D Korthauer, Michael D Swartz, Corinne D Engelman

Affiliation

¹ Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri, United States of America.

Abstract

Genetic Analysis Workshop 18 provided whole-genome sequence data in a pedigree-based sample and longitudinal phenotype data for hypertension and related traits, presenting an excellent opportunity for evaluating analysis choices. We summarize the nine contributions to the working group on collapsing methods, which evaluated various approaches for the analysis of multiple rare variants. One contributor defined a variant prioritization scheme, whereas the remaining eight contributors evaluated statistical methods for association analysis. Six contributors chose the gene as the genomic region for collapsing variants, whereas three contributors chose nonoverlapping sliding windows across the entire genome. Statistical methods spanned most of the published methods, including well-established burden tests, variance-components-type tests, and recently developed hybrid approaches. Lesser known methods, such as functional principal components analysis, higher criticism, and homozygosity association, and some newly introduced methods were also used. We found that performance of these methods depended on the characteristics of the genomic region, such as effect size and direction of variants under consideration. Except for MAP4 and FLT3, the performance of all statistical methods to identify rare casual variants was disappointingly poor, providing overall power almost identical to the type I error. This poor performance may have arisen from a combination of (1) small sample size, (2) small effects of most of the causal variants, explaining a small fraction of variance, (3) use of incomplete annotation information, and (4) linkage disequilibrium between causal variants in a gene and noncausal variants in nearby genes. Our findings demonstrate challenges in analyzing rare variants identified from sequence data.

Keywords: Genetic Analysis Workshop 18; burden tests; nonburden tests; rare variants; whole-genome sequence.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Genetic Variation*
Genotype
High-Throughput Nucleotide Sequencing
Homozygote
Humans
Hypertension / genetics
Hypertension / pathology
Linkage Disequilibrium
Pedigree
Phenotype
Polymorphism, Single Nucleotide
Sequence Analysis, DNA / methods*

Abstract

Publication types

MeSH terms

Grants and funding