Background: Many individuals with hypertension remain undiagnosed. We aimed to develop a predictive model for hypertension using diagnostic codes from prevailing electronic medical records in Swedish primary care.
Methods: This sex- and age-matched case-control (1:5) study included patients aged 30-65 years living in the Stockholm Region, Sweden, with a newly recorded diagnosis of hypertension during 2010-19 (cases) and individuals without a recorded hypertension diagnosis during 2010-19 (controls), in total 507,618 individuals. Patients with diagnoses of cardiovascular diseases or diabetes were excluded. A stochastic gradient boosting machine learning model was constructed using the 1,309 most registered ICD-10 codes from primary care for three years prior the hypertension diagnosis.
Results: The model showed an area under the curve (95 % confidence interval) of 0.748 (0.742-0.753) for females and 0.745 (0.740-0.751) for males for predicting diagnosis of hypertension within three years. The sensitivity was 63 % and 68 %, and the specificity 76 % and 73 %, for females and males, respectively. The 25 diagnoses that contributed the most to the model for females and males all exhibited a normalized relative influence >1 %. The codes contributing most to the model, all with an odds ratio of marginal effects >1 for both sexes, were dyslipidaemia, obesity, and encountering health services in other circumstances.
Conclusions: This machine learning model, using prevailing recorded diagnoses within primary health care, may contribute to the identification of patients at risk of unrecognized hypertension. The added value of this predictive model beyond information of blood pressure warrants further study.
Keywords: Artificial intelligence; Family practice; Gradient boosting; Hypertension; Opportunistic screening; Prediction.
© 2024 The Authors.