Integrating Genomics and Clinical Data for Statistical Analysis by Using GEnome MINIng (GEMINI) and Fast Healthcare Interoperability Resources (FHIR): System Design and Implementation

J Med Internet Res. 2020 Oct 7;22(10):e19879. doi: 10.2196/19879.

Abstract

Background: The introduction of next-generation sequencing (NGS) into molecular cancer diagnostics has led to an increase in the data available for the identification and evaluation of driver mutations and for defining personalized cancer treatment regimens. The meaningful combination of omics data, ie, pathogenic gene variants and alterations with other patient data, to understand the full picture of malignancy has been challenging.

Objective: This study describes the implementation of a system capable of processing, analyzing, and subsequently combining NGS data with other clinical patient data for analysis within and across institutions.

Methods: On the basis of the already existing NGS analysis workflows for the identification of malignant gene variants at the Institute of Pathology of the University Hospital Erlangen, we defined basic requirements on an NGS processing and analysis pipeline and implemented a pipeline based on the GEMINI (GEnome MINIng) open source genetic variation database. For the purpose of validation, this pipeline was applied to data from the 1000 Genomes Project and subsequently to NGS data derived from 206 patients of a local hospital. We further integrated the pipeline into existing structures of data integration centers at the University Hospital Erlangen and combined NGS data with local nongenomic patient-derived data available in Fast Healthcare Interoperability Resources format.

Results: Using data from the 1000 Genomes Project and from the patient cohort as input, the implemented system produced the same results as already established methodologies. Further, it satisfied all our identified requirements and was successfully integrated into the existing infrastructure. Finally, we showed in an exemplary analysis how the data could be quickly loaded into and analyzed in KETOS, a web-based analysis platform for statistical analysis and clinical decision support.

Conclusions: This study demonstrates that the GEMINI open source database can be augmented to create an NGS analysis pipeline. The pipeline generates high-quality results consistent with the already established workflows for gene variant annotation and pathological evaluation. We further demonstrate how NGS-derived genomic and other clinical data can be combined for further statistical analysis, thereby providing for data integration using standardized vocabularies and methods. Finally, we demonstrate the feasibility of the pipeline integration into hospital workflows by providing an exemplary integration into the data integration center infrastructure, which is currently being established across Germany.

Keywords: Fast Healthcare Interoperability Resources; GEnome MINIng; data analysis; data standardization; genetic databases; next-generation sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Decision Support Systems, Clinical / standards*
  • Delivery of Health Care / methods*
  • Genomics / methods*
  • Health Information Interoperability / standards*
  • Humans
  • Internet / standards*
  • Machine Learning / standards*