Defining gene ends: RNA polymerase II CTD threonine 4 phosphorylation marks transcription termination regions genome-wide

Nucleic Acids Res. 2024 Dec 24:gkae1240. doi: 10.1093/nar/gkae1240. Online ahead of print.

Abstract

Defining the beginning of a eukaryotic protein-coding gene is relatively simple. It corresponds to the first ribonucleotide incorporated by RNA polymerase II (Pol II) into the nascent RNA molecule. This nucleotide is protected by capping and maintained in the mature messenger RNA (mRNA). However, in higher eukaryotes, the end of mRNA is separated from the sites of transcription termination by hundreds to thousands of base pairs. Currently used genomic annotations only take account of the end of the mature transcript - the sites where pre-mRNA cleavage occurs, while the regions in which transcription terminates are unannotated. Here, we describe the evidence for a marker of transcription termination, which could be widely applicable in genomic studies. Pol II termination regions can be determined genome-wide by detecting Pol II phosphorylated on threonine 4 of its C-terminal domain (Pol II CTD-T4ph). Pol II in this state pauses before leaving the DNA template. Up to date this potent mark has been underused because the evidence for its place and role in termination is scattered across multiple publications. We summarize the observations regarding Pol II CTD-T4ph in termination regions and present bioinformatic analyses that further support Pol II CTD-T4ph as a global termination mark in animals.