SuRankCo: supervised ranking of contigs in de novo assemblies

BMC Bioinformatics. 2015 Jul 30:16:240. doi: 10.1186/s12859-015-0644-7.

Abstract

Background: Evaluating the quality and reliability of a de novo assembly and of single contigs in particular is challenging since commonly a ground truth is not readily available and numerous factors may influence results. Currently available procedures provide assembly scores but lack a comparative quality ranking of contigs within an assembly.

Results: We present SuRankCo, which relies on a machine learning approach to predict quality scores for contigs and to enable the ranking of contigs within an assembly. The result is a sorted contig set which allows selective contig usage in downstream analysis. Benchmarking on datasets with known ground truth shows promising sensitivity and specificity and favorable comparison to existing methodology.

Conclusions: SuRankCo analyzes the reliability of de novo assemblies on the contig level and thereby allows quality control and ranking prior to further downstream and validation experiments.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Contig Mapping / methods*
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • ROC Curve
  • Software*