Characterization of the human ESC transcriptome by hybrid sequencing

Proc Natl Acad Sci U S A. 2013 Dec 10;110(50):E4821-30. doi: 10.1073/pnas.1320101110. Epub 2013 Nov 26.

Abstract

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.

Keywords: PacBio; alternative splicing; hESC transcriptome; isoform discovery; lncNRA.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing / genetics*
  • Embryonic Stem Cells / chemistry
  • Embryonic Stem Cells / metabolism*
  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Male
  • Protein Isoforms / genetics*
  • Transcriptome / genetics*

Substances

  • Protein Isoforms

Associated data

  • GEO/GSE51861