Long introns with short exons in vertebrate genes are thought to require spliceosome assembly across exons (exon definition), rather than introns, thereby requiring transcription of an exon to splice an upstream intron. Here, we developed CoLa-seq (co-transcriptional lariat sequencing) to investigate the timing and determinants of co-transcriptional splicing genome wide. Unexpectedly, 90% of all introns, including long introns, can splice before transcription of a downstream exon, indicating that exon definition is not obligatory for most human introns. Still, splicing timing varies dramatically across introns, and various genetic elements determine this variation. Strong U2AF2 binding to the polypyrimidine tract predicts early splicing, explaining exon definition-independent splicing. Together, our findings question the essentiality of exon definition and reveal features beyond intron and exon length that are determinative for splicing timing.
Keywords: CoLa-seq; GC content; U2AF; branch point; co-transcriptional splicing; exon definition; intron definition; lariat RNAs; modeling; polypyrimidine tract.
Copyright © 2022 Elsevier Inc. All rights reserved.