Design and analysis considerations for combining data from multiple biomarker studies

Abigail Sloan; Yue Song; Mitchell H Gail; Rebecca Betensky; Bernard Rosner; Regina G Ziegler; Stephanie A Smith-Warner; Molin Wang

doi:10.1002/sim.8052

Design and analysis considerations for combining data from multiple biomarker studies

Stat Med. 2019 Apr 15;38(8):1303-1320. doi: 10.1002/sim.8052. Epub 2018 Dec 19.

Authors

Abigail Sloan¹, Yue Song¹, Mitchell H Gail², Rebecca Betensky¹, Bernard Rosner^{1

3}, Regina G Ziegler², Stephanie A Smith-Warner^{4

5}, Molin Wang^{1

3

5}

Affiliations

¹ Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts.
² Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.
³ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.
⁴ Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, Massachusetts.
⁵ Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts.

Abstract

Pooling data from multiple studies improves estimation of exposure-disease associations through increased sample size. However, biomarker exposure measurements can vary substantially across laboratories and often require calibration to a reference assay prior to pooling. We develop two statistical methods for aggregating biomarker data from multiple studies: the full calibration method and the internalized method. The full calibration method calibrates all biomarker measurements regardless of the availability of reference laboratory measurements while the internalized method calibrates only non-reference laboratory measurements. We compare the performance of these two aggregation methods to two-stage methods. Furthermore, we compare the aggregated and two-stage methods when estimating the calibration curve from controls only or from a random sample of individuals from the study cohort. Our findings include the following: (1) Under random sampling for calibration, exposure effect estimates from the internalized method have a smaller mean squared error than those from the full calibration method. (2) Under the controls-only calibration design, the full calibration method yields effect estimates with the least bias. (3) The two-stage approaches produce average effect estimates that are similar to the full calibration method under a controls only calibration design and the internalized method under a random sample calibration design. We illustrate the methods in an application evaluating the relationship between circulating vitamin D levels and stroke risk in a pooling project of cohort studies.

Keywords: aggregation; between-study variability; calibration; pooling project; two-stage method.

Publication types

Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural

MeSH terms

Algorithms
Biomarkers*
Calibration*
Data Interpretation, Statistical*
Humans
Odds Ratio
Research Design* / statistics & numerical data

Substances

Biomarkers

Abstract

Publication types

MeSH terms

Substances

Grants and funding