Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly

PLoS One. 2010 May 7;5(5):e10517. doi: 10.1371/journal.pone.0010517.

Abstract

Background: Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded.

Methodology: We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species.

Conclusions: The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • DNA, Complementary / genetics*
  • Gene Library
  • Humans
  • Nucleic Acid Hybridization / genetics*
  • Open Reading Frames / genetics
  • Reference Standards
  • Sequence Analysis, DNA / economics*
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / standards
  • Toxoplasma / genetics

Substances

  • DNA, Complementary