Background/aims: Present-day limited resources demand DNA and phenotyping alternatives to the traditional prospective population-based epidemiologic collections.
Methods: To accelerate genomic discovery with an emphasis on diverse populations, we--as part of the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study--accessed all non-European American samples (n = 15,863) available in BioVU, the Vanderbilt University biorepository linked to de-identified electronic medical records, for genomic studies as part of the larger Population Architecture using Genomics and Epidemiology (PAGE) I study. Given previous studies have cautioned against the secondary use of clinically collected data compared with epidemiologically collected data, we present here a characterization of EAGLE BioVU, including the billing and diagnostic (ICD-9) code distributions for adult and pediatric patients as well as comparisons made for select health metrics (body mass index, glucose, HbA1c, HDL-C, LDL-C, and triglycerides) with the population-based National Health and Nutrition Examination Surveys (NHANES) linked to DNA samples (NHANES III, n = 7,159; NHANES 1999-2002, n = 7,839).
Results: Overall, the distributions of billing and diagnostic codes suggest this clinical sample is a mixture of healthy and sick patients like that expected for a contemporary American population.
Conclusion: Little bias is observed among health metrics, suggesting this clinical collection is suitable for genomic studies along with traditional epidemiologic cohorts.
2015 S. Karger AG, Basel.