EPGA2: memory-efficient de novo assembler

Bioinformatics. 2015 Dec 15;31(24):3988-90. doi: 10.1093/bioinformatics/btv487. Epub 2015 Aug 26.

Abstract

Motivation: In genome assembly, as coverage of sequencing and genome size growing, most current softwares require a large memory for handling a great deal of sequence data. However, most researchers usually cannot meet the requirements of computing resources which prevent most current softwares from practical applications.

Results: In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly.

Availability and implementation: EPGA2 is publicly available for download at https://github.com/bioinfomaticsCSU/EPGA2.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genomics / methods*
  • Sequence Analysis, DNA
  • Software*