Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming

Bioinformatics. 2011 May 15;27(10):1351-8. doi: 10.1093/bioinformatics/btr151. Epub 2011 Mar 30.

Abstract

Motivation: The introduction of next-generation sequencing techniques and especially the high-throughput systems Solexa (Illumina Inc.) and SOLiD (ABI) made the mapping of short reads to reference sequences a standard application in modern bioinformatics. Short-read alignment is needed for reference based re-sequencing of complete genomes as well as for gene expression analysis based on transcriptome sequencing. Several approaches were developed during the last years allowing for a fast alignment of short sequences to a given template. Methods available to date use heuristic techniques to gain a speedup of the alignments, thereby missing possible alignment positions. Furthermore, most approaches return only one best hit for every query sequence, thus losing the potentially valuable information of alternative alignment positions with identical scores.

Results: We developed SARUMAN (Semiglobal Alignment of short Reads Using CUDA and NeedleMAN-Wunsch), a mapping approach that returns all possible alignment positions of a read in a reference sequence under a given error threshold, together with one optimal alignment for each of these positions. Alignments are computed in parallel on graphics hardware, facilitating an considerable speedup of this normally time-consuming step. Combining our filter algorithm with CUDA-accelerated alignments, we were able to align reads to microbial genomes in time comparable or even faster than all published approaches, while still providing an exact, complete and optimal result. At the same time, SARUMAN runs on every standard Linux PC with a CUDA-compatible graphics accelerator.

Availability: http://www.cebitec.uni-bielefeld.de/brf/saruman/saruman.html.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence
  • Computational Biology
  • Gene Expression Profiling
  • Genome, Bacterial*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA
  • Software