Objective: The study aimed to develop and validate algorithms for identifying people with type 1 and type 2 diabetes in the All of Us Research Program (AoU) cohort, using electronic health record (EHR) and survey data.
Research design and methods: Two sets of algorithms were developed, one using only EHR data (EHR), and the other using a combination of EHR and survey data (EHR+). Their performance was evaluated by testing their association with polygenic scores for both type 1 and type 2 diabetes.
Results: For type 1 diabetes, the EHR-only algorithm showed a stronger association with T1D polygenic score (p=3×10-5) than the EHR+. For type 2 diabetes, the EHR+ algorithm outperformed both the EHR-only and the existing AoU definition, identifying additional cases (25.79% and 22.57% more, respectively) and showing stronger association with T2D polygenic score (DeLong p=0.03 and 1×10-4, respectively).
Conclusions: We provide new validated definitions of type 1 and type 2 diabetes in AoU, and make them available for researchers. These algorithms, by ensuring consistent diabetes definitions, pave the way for high-quality diabetes research and future clinical discoveries.