Scaffolding a Caenorhabditis nematode genome with RNA-seq

Genome Res. 2010 Dec;20(12):1740-7. doi: 10.1101/gr.111021.110. Epub 2010 Oct 27.

Abstract

Efficient sequencing of animal and plant genomes by next-generation technology should allow many neglected organisms of biological and medical importance to be better understood. As a test case, we have assembled a draft genome of Caenorhabditis sp. 3 PS1010 through a combination of direct sequencing and scaffolding with RNA-seq. We first sequenced genomic DNA and mixed-stage cDNA using paired 75-nt reads from an Illumina GAII. A set of 230 million genomic reads yielded an 80-Mb assembly, with a supercontig N50 of 5.0 kb, covering 90% of 429 kb from previously published genomic contigs. Mixed-stage poly(A)(+) cDNA gave 47.3 million mappable 75-mers (including 5.1 million spliced reads), which separately assembled into 17.8 Mb of cDNA, with an N50 of 1.06 kb. By further scaffolding our genomic supercontigs with cDNA, we increased their N50 to 9.4 kb, nearly double the average gene size in C. elegans. We predicted 22,851 protein-coding genes, and detected expression in 78% of them. Multigenome alignment and data filtering identified 2672 DNA elements conserved between PS1010 and C. elegans that are likely to encode regulatory sequences or previously unknown ncRNAs. Genomic and cDNA sequencing followed by joint assembly is a rapid and useful strategy for biological analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Caenorhabditis / genetics*
  • Conserved Sequence / genetics
  • DNA, Complementary / genetics
  • Genome / genetics*
  • Genomics / methods*
  • Molecular Sequence Data
  • Phylogeny
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • DNA, Complementary

Associated data

  • GENBANK/AEHI01000000