The performance of a medical diagnostic test is often evaluated by comparing the outcome of the test to the patient's true disease state. Receiver operating characteristic analysis may then be used to summarize test accuracy. However, such analysis may encounter several complications in actual practice. One complication is verification bias, i.e., gold standard assessment of disease status may only be partially available and the probability of ascertainment of disease may depend on both the test result and characteristics of the subject. A second issue is that tests interpreted by the same rater may not be independent. Using estimating equations, we generalize previous methods that address these problems. We contrast the performance of alternative estimators of accuracy using robust sandwich variance estimators to permit valid asymptotic inference. We suggest that in the context of an observational cohort study where rich covariate information is available, a weighted estimating equations approach may be preferable for its robustness against model misspecification. We apply the methodology to mammography as performed by community radiologists.