A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads

Tobias Rausch; Sergey Koren; Gennady Denisov; David Weese; Anne-Katrin Emde; Andreas Döring; Knut Reinert

doi:10.1093/bioinformatics/btp131

A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads

Bioinformatics. 2009 May 1;25(9):1118-24. doi: 10.1093/bioinformatics/btp131. Epub 2009 Mar 5.

Authors

Tobias Rausch¹, Sergey Koren, Gennady Denisov, David Weese, Anne-Katrin Emde, Andreas Döring, Knut Reinert

Affiliation

¹ International Max Planck Research School for Computational Biology and Scientific Computing, Ihnestr. 63-73, Algorithmische Bioinformatik, Institut für Informatik, Takustr. 9, 14195 Berlin, Germany. [email protected]

Abstract

Motivation: Novel high-throughput sequencing technologies pose new algorithmic challenges in handling massive amounts of short-read, high-coverage data. A robust and versatile consensus tool is of particular interest for such data since a sound multi-read alignment is a prerequisite for variation analyses, accurate genome assemblies and insert sequencing.

Results: A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented. The program identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph. On real de novo sequencing data obtained from the newly established NCBI Short Read Archive, the program performs similarly in quality to other comparable programs. On more challenging simulated datasets for insert sequencing and variation analyses, our program outperforms the other tools.

Availability: The consensus program can be downloaded from http://www.seqan.de/projects/consensus.html. It can be used stand-alone or in conjunction with the Celera Assembler. Both application scenarios as well as the usage of the tool are described in the documentation.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms*
Base Sequence
Computational Biology / methods
Internet
Molecular Sequence Data
Sequence Alignment / methods*
Sequence Analysis, DNA / methods

Abstract

Publication types

MeSH terms

Grants and funding