Evaluating biomarkers in epidemiological studies can be expensive and time consuming. Many investigators use techniques such as random sampling or pooling biospecimens in order to cut costs and save time on experiments. Commonly, analyses based on pooled data are strongly restricted by distributional assumptions that are challenging to validate because of the pooled biospecimens. Random sampling provides data that can be easily analyzed. However, random sampling methods are not optimal cost-efficient designs for estimating means. We propose and examine a cost-efficient hybrid design that involves taking a sample of both pooled and unpooled data in an optimal proportion in order to efficiently estimate the unknown parameters of the biomarker distribution. In addition, we find that this design can be used to estimate and account for different types of measurement and pooling error, without the need to collect validation data or repeated measurements. We show an example where application of the hybrid design leads to minimization of a given loss function based on variances of the estimators of the unknown parameters. Monte Carlo simulation and biomarker data from a study on coronary heart disease are used to demonstrate the proposed methodology.
Published in 2010 by John Wiley & Sons, Ltd.