Harmonizing Norwegian registries onto OMOP common data model: Mapping challenges and opportunities for pregnancy and COVID-19 research

Int J Med Inform. 2024 Nov:191:105602. doi: 10.1016/j.ijmedinf.2024.105602. Epub 2024 Aug 14.

Abstract

Objective: Norwegian health registries covering entire population are used for administration, research, and emergency preparedness. We harmonized these data onto the Observational Medical Outcomes Partnership common data model (OMOP CDM) and enrich real-world data in OMOP format with COVID-19 related data.

Methods: Data from six registries (2018-2021) covering birth registrations, selected primary and secondary care events, vaccinations, and communicable disease notifications were mapped onto the OMOP CDM v5.3. An Extract-Transform-Load (ETL) pipeline was developed on simulated data using data characterization documents and scanning tools. We ran dashboard quality checks, cohort generations, investigated differences between source and mapped data, and refined the ETL accordingly.

Results: We mapped 1.5 billion rows of data of 5,673,845 individuals. Among these, there were 804,277 pregnancies, 483,585 mothers together with 792,477 children, and 472,948 fathers. We identified 382,516 positive tests for COVID-19 in 380,794 patients. These figures are consistent with results from source data. In addition to 11 million source codes mapped automatically, we mapped 237 non-standard codes to standard concepts and introduced 38 custom concepts to accommodate pregnancy-related terminologies that were not supported by OMOP CDM vocabularies. A total of 3,700/3,705 (99.8%) checks passed. The 5 failed checks could be explained by the nature of the data and only represent a small number of records.

Discussion and conclusion: Norwegian registry data were successfully harmonized onto OMOP CDM with high level of concordance and provides valuable source for federated COVID-19 related research. Our mapping experience is highly valuable for data partners with Nordic health registries.

Keywords: COVID-19; Common data model; Health registries; OMOP; Phenotyping.

MeSH terms

  • COVID-19* / epidemiology
  • COVID-19* / prevention & control
  • Female
  • Humans
  • Male
  • Norway / epidemiology
  • Pregnancy
  • Registries*
  • SARS-CoV-2