For utilizing electronic health records to help design and conduct clinical trials, an essential first step is to select eligible patients from electronic health records, that is, electronic health record phenotyping. We present two novel statistical methods that can be used in the context of electronic health record phenotyping. One mitigates the requirement for gold-standard control patients in developing phenotyping algorithms, and the other effectively corrects for bias in downstream analysis introduced by study samples contaminated by ineligible subjects.
Keywords: Electronic health records; anchor variable; case contamination; phenotyping.