The use of outcome-dependent sampling with longitudinal data analysis has previously been shown to improve efficiency in the estimation of regression parameters. The motivating scenario is when outcome data exist for all cohort members but key exposure variables will be gathered only on a subset. Inference with outcome-dependent sampling designs that also incorporates incomplete information from those individuals who did not have their exposure ascertained has been investigated for univariate but not longitudinal outcomes. Therefore, with a continuous longitudinal outcome, we explore the relative contributions of various sources of information toward the estimation of key regression parameters using a likelihood framework. We evaluate the efficiency gains that alternative estimators might offer over random sampling, and we offer insight into their relative merits in select practical scenarios. Finally, we illustrate the potential impact of design and analysis choices using data from the Cystic Fibrosis Foundation Patient Registry.
Keywords: biased sampling; epidemiological study design; longitudinal data analysis.
Copyright © 2018 John Wiley & Sons, Ltd.