Exploring ethnicity dynamics in Wales: a longitudinal population-scale linked data study and development of a harmonised ethnicity spine

BMJ Open. 2024 Aug 3;14(8):e077675. doi: 10.1136/bmjopen-2023-077675.

Abstract

Objective: This study aims to create a national ethnicity spine based on all available ethnicity records in linkable anonymised electronic health record and administrative data sources.

Design: A longitudinal study using anonymised individual-level population-scale ethnicity data from 26 data sources available within the Secure Anonymised Information Linkage Databank.

Setting: The national ethnicity spine is created based on longitudinal national data for the population of Wales-UK over 22 years (between 2000 and 2021).

Procedure and participants: A total of 46 million ethnicity records for 4 297 694 individuals have been extracted, harmonised, deduplicated and made available within a longitudinal research ready data asset.

Outcome measures: (1) Comparing the distribution of ethnicity records over time for four different selection approaches (latest, mode, weighted mode and composite) across age bands, sex, deprivation quintiles, health board and residential location and (2) distribution and completeness of records against the ONS census 2011.

Results: The distribution of the dominant group (white) is minimally affected based on the four different selection approaches. Across all other ethnic group categorisations, the mixed group was most susceptible to variation in distribution depending on the selection approach used and varied from a 0.6% prevalence across the latest and mode approach to a 1.1% prevalence for the weighted mode, compared with the 3.1% prevalence for the composite approach. Substantial alignment was observed with ONS 2011 census with the Latest group method (kappa=0.68, 95% CI (0.67 to 0.71)) across all subgroups. The record completeness rate was over 95% in 2021.

Conclusion: In conclusion, our development of the population-scale ethnicity spine provides robust ethnicity measures for healthcare research in Wales and a template which can easily be deployed in other trusted research environments in the UK and beyond.

Keywords: EPIDEMIOLOGY; Health policy; PUBLIC HEALTH.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Child
  • Child, Preschool
  • Electronic Health Records / statistics & numerical data
  • Ethnicity* / statistics & numerical data
  • Female
  • Humans
  • Infant
  • Infant, Newborn
  • Longitudinal Studies
  • Male
  • Middle Aged
  • Wales
  • Young Adult