A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples

Paola Sebastiani; Zhenming Zhao; Maria M Abad-Grau; Alberto Riva; Stephen W Hartley; Amanda E Sedgewick; Alessandro Doria; Monty Montano; Efthymia Melista; Dellara Terry; Thomas T Perls; Martin H Steinberg; Clinton T Baldwin

doi:10.1186/1471-2156-9-6

A hierarchical and modular approach to the discovery of robust associations in genome-wide association studies from pooled DNA samples

BMC Genet. 2008 Jan 14:9:6. doi: 10.1186/1471-2156-9-6.

Authors

Paola Sebastiani¹, Zhenming Zhao, Maria M Abad-Grau, Alberto Riva, Stephen W Hartley, Amanda E Sedgewick, Alessandro Doria, Monty Montano, Efthymia Melista, Dellara Terry, Thomas T Perls, Martin H Steinberg, Clinton T Baldwin

Affiliation

¹ Department of Biostatistics, Boston University School of Public Health, Boston 02118 MA, USA. [email protected]

Abstract

Background: One of the challenges of the analysis of pooling-based genome wide association studies is to identify authentic associations among potentially thousands of false positive associations.

Results: We present a hierarchical and modular approach to the analysis of genome wide genotype data that incorporates quality control, linkage disequilibrium, physical distance and gene ontology to identify authentic associations among those found by statistical association tests. The method is developed for the allelic association analysis of pooled DNA samples, but it can be easily generalized to the analysis of individually genotyped samples. We evaluate the approach using data sets from diverse genome wide association studies including fetal hemoglobin levels in sickle cell anemia and a sample of centenarians and show that the approach is highly reproducible and allows for discovery at different levels of synthesis.

Conclusion: Results from the integration of Bayesian tests and other machine learning techniques with linkage disequilibrium data suggest that we do not need to use too stringent thresholds to reduce the number of false positive associations. This method yields increased power even with relatively small samples. In fact, our evaluation shows that the method can reach almost 70% sensitivity with samples of only 100 subjects.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Bayes Theorem
Computational Biology
DNA / genetics*
Fetal Hemoglobin / genetics
Gene Frequency
Genetic Markers
Genome, Human*
Genotype*
Humans
Linkage Disequilibrium
Oligonucleotide Array Sequence Analysis
Polymorphism, Single Nucleotide
Reproducibility of Results
Sensitivity and Specificity

Substances

Genetic Markers
DNA
Fetal Hemoglobin

Abstract

Publication types

MeSH terms

Substances

Grants and funding