Integrating Hi-C links with assembly graphs for chromosome-scale assembly

PLoS Comput Biol. 2019 Aug 21;15(8):e1007273. doi: 10.1371/journal.pcbi.1007273. eCollection 2019 Aug.

Abstract

Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Chromosomes, Human / genetics*
  • Computational Biology
  • Computer Simulation
  • Databases, Nucleic Acid / statistics & numerical data
  • Genome, Human*
  • Genomic Library
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • High-Throughput Nucleotide Sequencing / methods
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Sequence Analysis, DNA / methods
  • Sequence Analysis, DNA / statistics & numerical data
  • Software