Integrating and analyzing medical and environmental data using ETL and Business Intelligence tools

Int J Biometeorol. 2018 Jun;62(6):1085-1095. doi: 10.1007/s00484-018-1511-9. Epub 2018 Mar 7.

Abstract

Processing data that originates from different sources (such as environmental and medical data) can prove to be a difficult task, due to the heterogeneity of variables, storage systems, and file formats that can be used. Moreover, once the amount of data reaches a certain threshold, conventional mining methods (based on spreadsheets or statistical software) become cumbersome or even impossible to apply. Data Extract, Transform, and Load (ETL) solutions provide a framework to normalize and integrate heterogeneous data into a local data store. Additionally, the application of Online Analytical Processing (OLAP), a set of Business Intelligence (BI) methodologies and practices for multidimensional data analysis, can be an invaluable tool for its examination and mining. In this article, we describe a solution based on an ETL + OLAP tandem used for the on-the-fly analysis of tens of millions of individual medical, meteorological, and air quality observations from 16 provinces in Spain provided by 20 different national and regional entities in a diverse array for file types and formats, with the intention of evaluating the effect of several environmental variables on human health in future studies. Our work shows how a sizable amount of data, spread across a wide range of file formats and structures, and originating from a number of different sources belonging to various business domains, can be integrated in a single system that researchers can use for global data analysis and mining.

Keywords: Business intelligence; Data integration; Data mining; Environmental data; Medical data.

MeSH terms

  • Air Pollution
  • Database Management Systems
  • Databases, Factual
  • Emergency Service, Hospital / statistics & numerical data
  • Hospitalization / statistics & numerical data
  • Humans
  • Information Storage and Retrieval / methods*
  • Spain
  • Systems Integration*
  • Weather