The HiGHmed consortium aims to create a shared information governance framework to integrate clinical routine data. One challenge is the replacement of unstructured reporting (e.g. doctoral letters) with structured reporting in clinical routine. The Heidelberg cardiology department evaluates dynamic PDF forms for structured data reporting of heart failure (HF) patients. In this use case, we aim to identify potential caveats or shortcomings in data processing at an early stage. We employed data mining strategies to detect patterns related to incomplete or false data, which we found to be present among all data types. We then discuss the characteristics of the baseline patient cohort in Heidelberg to find out about specific peculiarities and potential biases, which may be site-specific. Briefly, our patient population is predominantly male (67%), NYHA I & II are the most common severity classes, NYHA IV is missing entirely. Most patients have a dilated cardiomyopathy (DCM) or coronary heart disease (CHD) diagnosed as their cause of HF. Finally, we also analyzed how comorbidities and risk factors relate to specific disease entities of heart failure patients. Family anamnesis was more frequent among cardiomyopathy patients than among CHD patients, who show a more dominating presence of dyslipidemia instead. Generally, the most dominant risk factor was arterial hypertension, while at the other end of the scale alcoholism appears to be underreported.
Keywords: HiGHmed; comorbidity; data analysis; data quality; electronic health records; heart failure; measurement methods; missing data; openEHR; risk factor.