ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution

IEEE/ACM Trans Comput Biol Bioinform. 2017 Jul-Aug;14(4):916-925. doi: 10.1109/TCBB.2016.2550433. Epub 2016 Apr 5.

Abstract

The purpose of de novo assembly is to report more contiguous, complete, and less error prone contigs. Thanks to the advent of the next generation sequencing (NGS) technologies, the cost of producing high depth reads is reduced greatly. However, due to the disadvantages of NGS, de novo assembly has to face the difficulties brought by repeat regions, error rate, and low sequencing coverage in some regions. Although many de novo algorithms have been proposed to solve these problems, the de novo assembly still remains a challenge. In this article, we developed an iterative seed-extension algorithm for de novo assembly, called ISEA. To avoid the negative impact induced by error rate, ISEA utilizes reads overlap and paired-end information to correct error reads before assemblying. During extending seeds in a De Bruijn graph, ISEA uses an elaborately designed score function based on paired-end information and the distribution of insert size to solve the repeat region problem. By employing the distribution of insert size, the score function can also reduce the influence of error reads. In scaffolding, ISEA adopts a relaxed strategy to join contigs that were terminated for low coverage during the extension. The performance of ISEA was compared with six previous popular assemblers on four real datasets. The experimental results demonstrate that ISEA can effectively obtain longer and more accurate scaffolds.

MeSH terms

  • Algorithms*
  • DNA, Bacterial / analysis
  • DNA, Bacterial / genetics
  • DNA, Fungal / analysis
  • DNA, Fungal / genetics
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Neurospora crassa / genetics
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods
  • Staphylococcus aureus / genetics

Substances

  • DNA, Bacterial
  • DNA, Fungal