Distributed data networks: a blueprint for Big Data sharing and healthcare analytics

Ann N Y Acad Sci. 2017 Jan;1387(1):105-111. doi: 10.1111/nyas.13287. Epub 2016 Nov 18.

Abstract

This paper defines the attributes of distributed data networks and outlines the data and analytic infrastructure needed to build and maintain a successful network. We use examples from one successful implementation of a large-scale, multisite, healthcare-related distributed data network, the U.S. Food and Drug Administration-sponsored Sentinel Initiative. Analytic infrastructure-development concepts are discussed from the perspective of promoting six pillars of analytic infrastructure: consistency, reusability, flexibility, scalability, transparency, and reproducibility. This paper also introduces one use case for machine learning algorithm development to fully utilize and advance the portfolio of population health analytics, particularly those using multisite administrative data sources.

Keywords: Big Data; common data model; distributed data network; healthcare analytics; machine learning algorithm.

Publication types

  • Review

MeSH terms

  • Access to Information*
  • Algorithms
  • Computational Biology / instrumentation
  • Computational Biology / methods*
  • Computational Biology / trends
  • Computer Communication Networks* / instrumentation
  • Computer Communication Networks* / trends
  • Data Mining / methods*
  • Data Mining / trends
  • Database Management Systems / instrumentation
  • Database Management Systems / trends
  • Decision Making, Computer-Assisted
  • Humans
  • Machine Learning
  • Medical Informatics / instrumentation
  • Medical Informatics / methods
  • Medical Informatics / trends
  • Sentinel Surveillance*
  • United States
  • United States Food and Drug Administration