Critical assessment of assembly strategies for non-model species mRNA-Seq data and application of next-generation sequencing to the comparison of C(3) and C(4) species

J Exp Bot. 2011 May;62(9):3093-102. doi: 10.1093/jxb/err029. Epub 2011 Mar 11.

Abstract

Next-generation sequencing enables the study of species without a sequenced genome at the 'omics' level. Custom transcriptome databases are generated and global expression profiles can be compared. However, the assembly of transcriptome sequence reads into contigs remains a daunting task. In this study, five different assembly programs, both traditional overlap-based, 'read-centric' assemblers and de Bruijn graph data structure-based assemblers, were compared. To this end, artificial read libraries with and without simulated sequencing errors were constructed from Arabidopsis thaliana, based on quantitative profiles of mature leaf tissue. The open source TGICL pipeline and the commercial CLC bio genomics workbench produced the best assemblies in terms of contig length, hybrid assemblies, redundancy reduction, and error tolerance. The mature leaf transcriptomes of the C(3) species Cleome spinosa and the C(4) species Cleome gynandra were assembled and analysed. The pathways and cellular processes tagged in the transcriptome assemblies reflect processes of a mature leaf. The databases are useful for extracting transcripts related to C(4) processes as full-length or nearly full-length sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arabidopsis / chemistry
  • Arabidopsis / genetics*
  • Base Sequence
  • Cleome / chemistry
  • Cleome / genetics*
  • Computer Simulation
  • Contig Mapping / methods*
  • DNA, Complementary / genetics
  • Databases, Nucleic Acid
  • Gene Library
  • Genome, Plant / genetics*
  • High-Throughput Nucleotide Sequencing
  • Models, Genetic
  • Molecular Sequence Data
  • Plant Leaves / chemistry
  • Plant Leaves / genetics
  • Polymorphism, Single Nucleotide
  • RNA, Messenger / chemistry
  • RNA, Messenger / genetics
  • RNA, Plant / genetics
  • Sequence Analysis, RNA
  • Software
  • Transcriptome*

Substances

  • DNA, Complementary
  • RNA, Messenger
  • RNA, Plant