Owing greatly to the advancement of next-generation sequencing (NGS), the amount of NGS data is increasing rapidly. Although there are many NGS applications, one of the most commonly used techniques 'RNA sequencing (RNA-seq)' is rapidly replacing microarray-based techniques in laboratories around the world. As more and more of such techniques are standardized, allowing technicians to perform these experiments with minimal hands-on time and reduced experimental/operator-dependent biases, the bottleneck of such techniques is clearly visible; that is, data analysis. Further complicating the matter, increasing evidence suggests most of the genome is transcribed into RNA; however, the majority of these RNAs are not translated into proteins. These RNAs that do not become proteins are called 'noncoding RNAs (ncRNAs)'. Although some time has passed since the discovery of ncRNAs, their annotations remain poor, making analysis of RNA-seq data challenging. Here, we examine the current limitations of RNA-seq analysis using case studies focused on the detection of novel transcripts and examination of their characteristics. Finally, we validate the presence of novel transcripts using biological experiments, showing novel transcripts can be accurately identified when a series of filters is applied. In conclusion, novel transcripts that are identified from RNA-seq must be examined carefully before proceeding to biological experiments.
Keywords: RNA-seq; gene expression; lncRNA; novel transcripts; transcriptome assembly.
© The Author 2015. Published by Oxford University Press. For Permissions, please email: [email protected].