Phenotyping is an automated technique for identifying patients diagnosed with a particular disease based on electronic health records (EHRs). To evaluate phenotyping algorithms, which should be reproducible, the annotation of EHRs as a gold standard is critical. However, we have found that the different types of EHRs cannot be definitively annotated into CASEs or CONTROLs. The influence of such "possible patients" on phenotyping algorithms is unknown. To assess these issues, for four chronic diseases, we annotated EHRs by using information not directly referring to the diseases and developed two types of phenotyping algorithms for each disease. We confirmed that each disease included different types of possible patients. The performance of phenotyping algorithms differed depending on whether possible patients were considered as CASEs, and this was independent of the type of algorithms. Our results indicate that researchers must share annotation criteria for classifying the possible patients to reproduce phenotyping algorithms.
Keywords: Clinical Phenotyping; Data Annotation; Electronic Health Records.