Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis

Bioinformatics. 2018 Oct 1;34(19):3265-3272. doi: 10.1093/bioinformatics/bty378.

Abstract

Motivation: De novo transcriptome analysis using RNA-seq offers a promising means to study gene expression in non-model organisms. Yet, the difficulty of transcriptome assembly means that the contigs provided by the assembler often represent a fractured and incomplete view of the transcriptome, complicating downstream analysis. We introduce Grouper, a new method for clustering contigs from de novo assemblies that are likely to belong to the same transcripts and genes; these groups can subsequently be analyzed more robustly. When provided with access to the genome of a related organism, Grouper can transfer annotations to the de novo assembly, further improving the clustering.

Results: On de novo assemblies from four different species, we show that Grouper is able to accurately cluster a larger number of contigs than the existing state-of-the-art method. The Grouper pipeline is able to map greater than 10% more reads against the contigs, leading to accurate downstream differential expression analyses. The labeling module, in the presence of a closely related annotated genome, can efficiently transfer annotations to the contigs and use this information to further improve clustering. Overall, Grouper provides a complete and efficient pipeline for processing de novo transcriptomic assemblies.

Availability and implementation: The Grouper software is freely available at https://github.com/COMBINE-lab/grouper under the 2-clause BSD license.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Cluster Analysis*
  • Computational Biology
  • Gene Expression Profiling
  • Molecular Sequence Annotation
  • Transcriptome*