Background and objectives: Nearly all genetic analyses of Parkinson disease (PD) have been in populations of European ancestry. We sought to test the ability of a machine learning method to extract accurate PD diagnoses from an electronic medical record (EMR) system, to see whether genetic variants identified in European populations generalize to individuals of African and Hispanic ancestries, and to compare the rates of PD across ancestries.
Methods: A machine learning method using natural language processing was applied to EMRs of US veterans participating in the VA Million Veteran Program (MVP) to identify individuals with PD. These putative cases were vetted via blind chart review by a movement disorder specialist. A polygenic risk score (PRS) of 90 established genetic variants whose genotypes were imputed from a customized Axiom Biobank Array was evaluated in different case groups.
Results: The EMR prediction scores had a distinct trimodal distribution, with 97% of the high group and only 30% of the middle group having a credible diagnosis of PD. Using the 3,542 cases from the high group matched 4:1 to controls, the PRS was highly predictive in individuals of European ancestry (n = 3,137 cases; OR = 1.82; p = 8.01E-48), and nearly identical effect sizes were seen in individuals of African (n = 184; OR = 2.07; p = 3.4E-4) and Hispanic ancestries (n = 221; OR = 2.13; p = 3.9E-6). The PRS was much less predictive for the 2,757 European ancestry cases who had an ICD code for PD but for whom the machine learning method had a lower confidence in their diagnosis. No novel ancestry-specific genetic variants were identified. Individuals with African ancestry had one-quarter the rate of PD compared with European or Hispanic ancestries aged 60-70 years and one half the rate in the 70-80 years age range. African American cases had a higher proportion of their DNA originating in Europe compared with African American controls.
Discussion: Machine learning can reliably classify PD using data from a large EMR. Larger studies of non-European populations are required to confirm the generalizability of PD risk variants identified in populations of European ancestry and the increased risk coming from a higher proportion of European DNA in African Americans.
Copyright © 2023 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology.