A Privacy-Preserving Infrastructure for Analyzing Personal Health Data in a Vertically Partitioned Scenario

Stud Health Technol Inform. 2019 Aug 21:264:373-377. doi: 10.3233/SHTI190246.

Abstract

It is widely anticipated that the use and analysis of health-related big data will enable further understanding and improvements in human health and wellbeing. Here, we propose an innovative infrastructure, which supports secure and privacy-preserving analysis of personal health data from multiple providers with different governance policies. Our objective is to use this infrastructure to explore the relation between Type 2 Diabetes Mellitus status and healthcare costs. Our approach involves the use of distributed machine learning to analyze vertically partitioned data from the Maastricht Study, a prospective population-based cohort study, and data from the official statistics agency of the Netherlands, Statistics Netherlands (Centraal Bureau voor de Statistiek; CBS). This project seeks an optimal solution accounting for scientific, technical, and ethical/legal challenges. We describe these challenges, our progress towards addressing them in a practical use case, and a simulation experiment.

Keywords: Data Science; Health Information Systems; Machine Learning.

MeSH terms

  • Diabetes Mellitus, Type 2
  • Health Records, Personal
  • Humans
  • Netherlands
  • Privacy*
  • Prospective Studies