Ultra-deep sequencing of ribosome-associated poly-adenylated RNA in early Drosophila embryos reveals hundreds of conserved translated sORFs

DNA Res. 2016 Dec;23(6):571-580. doi: 10.1093/dnares/dsw040. Epub 2016 Aug 24.

Abstract

There is growing recognition that small open reading frames (sORFs) encoding peptides shorter than 100 amino acids are an important class of functional elements in the eukaryotic genome, with several already identified to play critical roles in growth, development, and disease. However, our understanding of their biological importance has been hindered owing to the significant technical challenges limiting their annotation. Here we combined ultra-deep sequencing of ribosome-associated poly-adenylated RNAs with rigorous conservation analysis to identify a comprehensive population of translated sORFs during early Drosophila embryogenesis. In total, we identify 399 sORFs, including those previously annotated but without evidence of translational capacity, those found within transcripts previously classified as non-coding, and those not previously known to be transcribed. Further, we find, for the first time, evidence for translation of many sORFs with different isoforms, suggesting their regulation is as complex as longer ORFs. Furthermore, many sORFs are found not associated with ribosomes in late-stage Drosophila S2 cells, suggesting that many of the translated sORFs may have stage-specific functions during embryogenesis. These results thus provide the first comprehensive annotation of the sORFs present during early Drosophila embryogenesis, a necessary basis for a detailed delineation of their function in embryogenesis and other biological processes.

Keywords: PhyloCSF; early Drosophila embryo; sORFs; small open reading frames; translatome.

MeSH terms

  • Animals
  • Conserved Sequence*
  • Drosophila / embryology
  • Drosophila / genetics*
  • Drosophila Proteins / genetics
  • Drosophila Proteins / metabolism
  • Gene Expression Regulation, Developmental*
  • High-Throughput Nucleotide Sequencing
  • Molecular Sequence Annotation
  • Open Reading Frames*
  • RNA, Messenger / chemistry
  • RNA, Messenger / genetics*
  • Ribosomes / metabolism
  • Sequence Analysis, RNA

Substances

  • Drosophila Proteins
  • RNA, Messenger