The integration of rare disease medical databases belonging to different countries is an important problem, as a large number of observations are required for reliable statistical inference of patient data in order to facilitate clinical research. Such integration of national registry data, which requires harmonization of the heterogeneous data sets into a unified view, is facilitated in the European FAIRVASC project by developing a domain-specific ontology. The FAIRVASC project is dedicated to the rare disease of anti-neutrophil cytoplasmic antibody (ANCA) associated vasculitis (AAV). This paper focuses on the practical issues and challenges, encountered during the process of integrating the Polish national database POLVAS into the federated database within the FAIRVASC project. It discusses the use of ontology-based methods for data integration and the importance of ensuring patient privacy and data protection. It addresses the problem of missing information in POLVAS, which can be obtained by aggregating other data available within the database, incompatibility of data types and formats, and mapping polish data names into the common vocabulary. The modifications of mappings used to 'uplift' national data into the Resource Description Framework (RDF) triplestore are also proposed. The described methods allow for integrating the Polish national database into the European network over which federated queries are performed.
Keywords: Data integration; Federated queries; Ontologies; Rare diseases.
Copyright © 2024 Elsevier Ltd. All rights reserved.