Retrospective chart review (RCR) studies rely on the collection and analysis of documented clinical data, a process that can be prone to errors. The aim of this study was to develop a defined set of criteria to evaluate RCR datasets for potential data errors. The Data Error Criteria (DEC) were developed by identifying data coding and data entry errors via literature review and then classifying them based on error types. Three components comprise the DEC: general errors, numerical-specific errors, and categorical variable-specific errors. Two reviewers independently applied these criteria via a manual review process to an existing de-identified database. A total of 10,168 errors were identified out of a total of 28,656 data points. The total number of errors included redundancies as certain errors may be included in multiple categories. These included 2515 general errors, 39 numerical-specific errors, and 7614 categorical variable-specific errors. Input-related categorical variable-specific errors occurred most frequently, followed by errors secondary to blank cells. Inter-rater agreement was near perfect for all categories. Identifying errors outlined in the DEC can be crucial for the data analysis stage as they can lead to inaccurate calculations and delay study timelines. The DEC offers a framework to evaluate datasets while reducing time and efforts needed to create high-quality RCR-related databases.
Keywords: Research; data error; retrospective chart review.