Purpose: To describe the development of INSIGHT, a real-world data quality tool to assess completeness, consistency, and fitness-for-purpose of observational health data sources.
Methods: We designed a three-level pipeline with data quality assessments (DQAs) to be performed in ConcePTION Common Data Model (CDM) instances. The pipeline has been coded using R.
Results: INSIGHT is an open-source tool that identifies potential data quality issues in CDM-standardized instances through the systematic execution and summary of over 588 configurable DQAs. Level 1 focuses on conformance to the ConcePTION CDM specifications. Level 2 evaluates the temporal plausibility of events and uniqueness of records. Level 3 provides an overview of distributions, outliers, and trends over time to facilitate fit-for-purpose evaluation. Therefore, level 1 and 2 assure a proper data standardization, while level 3 provides information regarding the study population, and potential sub-populations. The DQAs are run locally and assessed centrally by a data quality revisor together with the data access provider's representatives.
Discussion: Data quality is the sum of several internal and external features of the data. While DQAs can provide reassurance about fitness-for-purpose for secondary-use data sources, improvements in data collection are essential to reduce errors and enhance overall data quality for Real World Evidence.
Conclusion: INSIGHT aims to support clinical and regulatory decision-making for medicines and vaccines by evaluating the quality of observational health data sources to support fit for purpose assessment. Assessing and improving data quality will enhance the reliability and quality of the generated evidence.
Study registration: This research was registered in EU PAS registration with number EU50142.
Keywords: common data model; data quality; drug safety; multi‐database studies; pregnancy; quality checks; real‐world data.
© 2025 The Author(s). Pharmacoepidemiology and Drug Safety published by John Wiley & Sons Ltd.