The results of conventional gene-based analyses which combine epigenome and transcriptome data, including those conducted by the ENCODE/modENCODE projects, suggest various histone modifications performing regulatory functions in controlling mRNA expression (referred to as a histone code) in several model animals. While some histone codes were found to be universally adopted across organisms, "species-specific" histone codes have also been defined. We found that the characterization of these histone codes was confounded by factors (e.g. gene essentiality, expression breadth) that are independent of, but correlated with, gene expression levels. Hence, we attempted to decode histone marks in mouse (Mus musculus), fly (Drosophila melanogaster), and worm (Caenorhabditis elegans) genomes by examining ratios of RNA sequencing (and chromatin immunoprecipitation sequencing) intensities between paralog genes to remove confounding effects that would otherwise be present in a gene-based approach. With this paralog-based approach, associations between four histone modifications (H3K4me3, H3K27ac, H3K9ac, and H3K36me3) and gene expression are substantially revised. For example, we demonstrate that H3K27ac and H3K9ac represent universal active marks in promoters, rather than worm-specific marks as previously reported. Second, acting regions of the studied active marks that are common across species (and across a wide range of tissues at different developmental stages) were found to extend beyond the previously defined regions. Thus, it appears that the active histone codes analyzed have a universality that has previously been underappreciated. Our results suggested that these universal codes, including those previously considered species-specific, could have an ancient origin, and are important in regulating animal gene expression abundance.
Keywords: Epigenetics; Evolutionary conservation; Gene duplication; Homogeneity; Transcriptional elongation; Transcriptional initiation.
© 2021 The Author(s).