Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory

Getiria Onsongo; Jesse Erdmann; Michael D Spears; John Chilton; Kenneth B Beckman; Adam Hauge; Sophia Yohe; Matthew Schomaker; Matthew Bower; Kevin A T Silverstein; Bharat Thyagarajan

doi:10.1186/1756-0500-7-314

Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory

BMC Res Notes. 2014 May 23:7:314. doi: 10.1186/1756-0500-7-314.

Authors

Getiria Onsongo, Jesse Erdmann, Michael D Spears, John Chilton, Kenneth B Beckman, Adam Hauge, Sophia Yohe, Matthew Schomaker, Matthew Bower, Kevin A T Silverstein¹, Bharat Thyagarajan

Affiliation

¹ Research Informatics Support Systems, Minnesota Supercomputing Institute, University of Minnesota, Room 599 Walter Library 117 Pleasant St SE, Minneapolis, MN 55455, USA. [email protected].

Abstract

Background: The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories.

Findings: To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample.

Conclusions: We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Clinical Laboratory Techniques* / economics
High-Throughput Nucleotide Sequencing / economics
High-Throughput Nucleotide Sequencing / methods*
Humans
Internet* / economics
Reproducibility of Results
Sequence Analysis, DNA / economics
Sequence Analysis, DNA / methods*
Statistics as Topic*