Background: Primary care records from the UK have frequently been used to identify episodes of upper gastrointestinal bleeding in studies of drug toxicity because of their comprehensive population coverage and longitudinal recording of prescriptions and diagnoses. Recent linkage within England of primary and secondary care data has augmented this data but the timing and coding of concurrent events, and how the definition of events in linked data effects occurrence and 28 day mortality is not known.
Methods: We used the recently linked English Hospital Episodes Statistics and General Practice Research Database, 1997-2010, to define events by; a specific upper gastrointestinal bleed code in either dataset, a specific bleed code in both datasets, or a less specific but plausible code from the linked dataset.
Results: This approach resulted in 81% of secondary care defined bleeds having a corresponding plausible code within 2 months in primary care. However only 62% of primary care defined bleeds had a corresponding plausible HES admission within 2 months. The more restrictive and specific case definitions excluded severe events and almost halved the 28 day case fatality when compared to broader and more sensitive definitions.
Conclusions: Restrictive definitions of gastrointestinal bleeding in linked datasets fail to capture the full heterogeneity in coding possible following complex clinical events. Conversely too broad a definition in primary care introduces events not severe enough to warrant hospital admission. Ignoring these issues may unwittingly introduce selection bias into a study's results.