Background: Transcriptomics reveals the existence of transcripts of different coding potential and strand orientation. Alternative splicing (AS) can yield proteins with altered number and types of functional domains, suggesting the global occurrence of transcriptional and post-transcriptional events. Many biological processes, including seed maturation and desiccation, are regulated post-transcriptionally (e.g., by AS), leading to the production of more than one coding or noncoding sense transcript from a single locus.
Results: We present an integrated computational framework to predict isoform-specific functions of plant transcripts. This framework includes a novel plant-specific weighted support vector machine classifier called CodeWise, which predicts the coding potential of transcripts with over 96 % accuracy, and several other tools enabling global sequence similarity, functional domain, and co-expression network analyses. First, this framework was applied to all detected transcripts (103,106), out of which 13 % was predicted by CodeWise to be noncoding RNAs in developing soybean embryos. Second, to investigate the role of AS during soybean embryo development, a population of 2,938 alternatively spliced and differentially expressed splice variants was analyzed and mined with respect to timing of expression. Conserved domain analyses revealed that AS resulted in global changes in the number, types, and extent of truncation of functional domains in protein variants. Isoform-specific co-expression network analysis using ArrayMining and clustering analyses revealed specific sub-networks and potential interactions among the components of selected signaling pathways related to seed maturation and the acquisition of desiccation tolerance. These signaling pathways involved abscisic acid- and FUSCA3-related transcripts, several of which were classified as noncoding and/or antisense transcripts and were co-expressed with corresponding coding transcripts. Noncoding and antisense transcripts likely play important regulatory roles in seed maturation- and desiccation-related signaling in soybean.
Conclusions: This work demonstrates how our integrated framework can be implemented to make experimentally testable predictions regarding the coding potential, co-expression, co-regulation, and function of transcripts and proteins related to a biological process of interest.