More Accurate Transcript Assembly via Parameter Advising

J Comput Biol. 2020 Aug;27(8):1181-1189. doi: 10.1089/cmb.2019.0286. Epub 2020 Apr 21.

Abstract

Computational tools used for genomic analyses are becoming more accurate but also increasingly sophisticated and complex. This introduces a new problem in that these pieces of software have a large number of tunable parameters that often have a large influence on the results that are reported. We quantify the impact of parameter choice on transcript assembly and take some first steps toward generating a truly automated genomic analysis pipeline by developing a method for automatically choosing input-specific parameter values for reference-based transcript assembly using the Scallop tool. By choosing parameter values for each input, the area under the receiver operator characteristic curve (AUC) when comparing assembled transcripts to a reference transcriptome is increased by an average of 28.9% over using only the default parameter choices on 1595 RNA-Seq samples in the Sequence Read Archive. This approach is general, and when applied to StringTie, it increases the AUC by an average of 13.1% on a set of 65 RNA-Seq experiments from ENCODE. Parameter advisors for both Scallop and StringTie are available on Github.

Keywords: automated bioinformatics; genomics; parameter advising; transcript assembly.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / trends*
  • Genome / genetics*
  • Genomics
  • Molecular Sequence Annotation
  • RNA / genetics
  • Sequence Analysis, RNA / methods*
  • Software*
  • Transcriptome / genetics

Substances

  • RNA