Structural changes of chromatin modulate access to DNA for the molecular machinery involved in the control of transcription. These changes are linked to variations in epigenetic marks that allow to classify chromatin in different functional states depending on the pattern of these histone marks. Importantly, alterations in chromatin states are known to be linked with various diseases, and their changes are known to explain processes such as cellular proliferation. For most of the available samples, there are not enough epigenomic data available to accurately determine chromatin states for the cells affected in each of them. This is mainly due to high costs of performing this type of experiments but also because of lack of a sufficient amount of sample or its degradation. In this work, we describe a cascade method based on a random forest algorithm to infer epigenetic marks, and by doing so, to identify relationships between different histone marks. Importantly, our approach also reduces the number of experimentally determined marks required to assign chromatin states. Moreover, in this work we have identified several relationships between patterns of different histone marks, which strengthens the evidence in favor of a redundant epigenetic code.
Keywords: Random Forest; chromatin states; epigenetic marks.
© The Author(s) 2024. Published by Oxford University Press.