Efficient logistic regression designs under an imperfect population identifier

Biometrics. 2014 Mar;70(1):175-84. doi: 10.1111/biom.12106. Epub 2013 Nov 21.

Abstract

Motivated by actual study designs, this article considers efficient logistic regression designs where the population is identified with a binary test that is subject to diagnostic error. We consider the case where the imperfect test is obtained on all participants, while the gold standard test is measured on a small chosen subsample. Under maximum-likelihood estimation, we evaluate the optimal design in terms of sample selection as well as verification. We show that there may be substantial efficiency gains by choosing a small percentage of individuals who test negative on the imperfect test for inclusion in the sample (e.g., verifying 90% test-positive cases). We also show that a two-stage design may be a good practical alternative to a fixed design in some situations. Under optimal and nearly optimal designs, we compare maximum-likelihood and semi-parametric efficient estimators under correct and misspecified models with simulations. The methodology is illustrated with an analysis from a diabetes behavioral intervention trial.

Keywords: Case-control designs; Diagnostic accuracy; Epidemiologic designs; Measurement error; Misclassification.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Adolescent
  • Behavior Therapy / standards
  • Child
  • Computer Simulation
  • Diabetes Mellitus, Type 1 / psychology
  • Diagnostic Errors / adverse effects*
  • Glycated Hemoglobin / analysis
  • Humans
  • Likelihood Functions*
  • Logistic Models*
  • Research Design*

Substances

  • Glycated Hemoglobin A