Improving the Accuracy and Precision of Disease Identification When Utilizing Ehr Data for Research: the Case for Hepatocellular Carcinoma

Res Sq [Preprint]. 2024 Oct 18:rs.3.rs-4993106. doi: 10.21203/rs.3.rs-4993106/v1.

Abstract

Objective: We assessed the performance of ICD codes to identify patients with hepatocellular carcinoma (HCC) in a large academic health system and determined whether employing an algorithm using a combination of ICD codes could deliver higher accuracy and precision than single ICD codes in identifying HCC cases using electronic health record (EHR) data.

Results: The use of a single ICD code entry for HCC (ICD-9-CM 155.0 or ICD-10-CM C22.0) in our cohort of 1,007 established ambulatory care patients with potential HCC yielded 58% false positives (not true HCC cases) based on chart reviews. We developed an ICD code-based algorithm that prioritized positive predictive value (PPV), F-score, and accuracy to minimize false positives and negatives. The highest performing algorithm required at least 10 ICD code entries for HCC and the sum of ICD code entries for HCC to exceed the sum of ICD code entries for non-HCC malignancies. The algorithm demonstrated high performance (PPV 97.4%, F-score 0.92, accuracy 94%), which was internally validated (PPV 92.3%, F-score 0.90, accuracy 91%) using a separate sample of potential HCC cases. Our findings support the need to assess the accuracy and precision of ICD codes before using EHR data to study HCC more broadly.

Publication types

  • Preprint