Supercomputing for the parallelization of whole genome analysis

Bioinformatics. 2014 Jun 1;30(11):1508-13. doi: 10.1093/bioinformatics/btu071. Epub 2014 Feb 12.

Abstract

Motivation: The declining cost of generating DNA sequence is promoting an increase in whole genome sequencing, especially as applied to the human genome. Whole genome analysis requires the alignment and comparison of raw sequence data, and results in a computational bottleneck because of limited ability to analyze multiple genomes simultaneously.

Results: We now adapted a Cray XE6 supercomputer to achieve the parallelization required for concurrent multiple genome analysis. This approach not only markedly speeds computational time but also results in increased usable sequence per genome. Relying on publically available software, the Cray XE6 has the capacity to align and call variants on 240 whole genomes in ∼50 h. Multisample variant calling is also accelerated.

Availability and implementation: The MegaSeq workflow is designed to harness the size and memory of the Cray XE6, housed at Argonne National Laboratory, for whole genome analysis in a platform designed to better match current and emerging sequencing volume.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computers*
  • Genome, Human*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Software