Background: Routinely-collected healthcare data provide a valuable resource for epidemiological research. Validation studies have shown that for most conditions, simple lists of clinical codes can reliably be used for case finding in primary care, however, studies exploring the robustness of this approach are lacking for diseases such as idiopathic pulmonary fibrosis (IPF) which are largely managed in secondary care.
Method: Using the UK's Clinical Practice Research Datalink (CPRD) Aurum dataset, which comprises patient-level primary care records linked to national hospital admissions and cause-of-death data, we compared the positive predictive value (PPV) of eight diagnostic algorithms. Algorithms were developed based on the literature and IPF diagnostic guidelines using combinations of clinical codes in primary and secondary care (SNOMED-CT or ICD-10) with/without additional information. The positive predictive value (PPV) was estimated for each algorithm using the death record as the gold standard. Utilization of the reviewed codes across the study period was observed to evaluate any change in coding practices over time.
Result: A total of 17,559 individuals had a least one record indicative of IPF in one or more of our three linked datasets between 2008 and 2018. The PPV of case-finding algorithms based on clinical codes alone ranged from 64.4% (95%CI:63.3-65.3) for a "broad" codeset to 74.9% (95%CI:72.8-76.9) for a "narrow" codeset comprising highly-specific codes. Adding confirmatory evidence, such as a CT scan, increased the PPV of our narrow code-based algorithm to 79.2% (95%CI:76.4-81.8) but reduced the sensitivity to under 10%. Adding evidence of hospitalisation to the standalone code-based algorithms also improved PPV, (PPV = 78.4 vs. 64.4%; sensitivity = 53.5% vs. 38.1%). IPF coding practices changed over time, with the increased use of specific IPF codes.
Conclusion: High diagnostic validity was achieved by using a restricted set of IPF codes. While adding confirmatory evidence increased diagnostic accuracy, the benefits of this approach need to be weighed against the inevitable loss of sample size and convenience. We would recommend use of an algorithm based on a broader IPF code set coupled with evidence of hospitalisation.
Keywords: CPRD; Diagnostic codes; HES; Idiopathic pulmonary fibrosis; Interstitial lung disease; Pulmonary fibrosis; Validation.
© 2023. The Author(s).