Purpose: To comply with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule, many real-world data providers mask a patient's date of birth by supplying only year of birth to data users. The lack of granularity around patient age is a challenge when using RWD, especially for pediatric research studies. In this study, a proxy for patient date of birth is evaluated using electronic health record (EHR) data.
Methods: This validation study leverages a retrospective cohort of EHR data from Mass General Brigham (MGB) patients born between January 1, 2018, and December 31, 2022, to assess the use of the date of a patient's first observed International Classification of Diseases 10th Revision Clinical Modification (ICD-10-CM) day-of birth code (Z37* or Z38*) as a proxy for date of birth. Alternative proxy measures such as date of first other infancy-related ICD-10-CM code and date of first clinical activity were also assessed.
Results: Of 82 398 patients born during the five-year study period, 58 047 (70.4%) had an ICD-10-CM birth code and were included in the primary analysis. The mean difference between true date of birth and first observed birth code was 0.3 days with a standard deviation of 15.0 days. The first observed birth code occurred within 30 days of the true date of birth in 99.9% of cases.
Conclusion: Results from this study suggest that the date of the first day-of ICD-10-CM birth code can be used as a proxy for true patient date of birth in pediatric RWD studies.
Keywords: pediatric research; proxy measure; real‐world data.
© 2025 John Wiley & Sons Ltd.