Robust chromatin state annotation

Mehdi Foroozandeh Shahraki; Marjan Farahbod; Maxwell W Libbrecht

doi:10.1101/gr.278343.123

Robust chromatin state annotation

Genome Res. 2024 Apr 25;34(3):469-483. doi: 10.1101/gr.278343.123.

Authors

Mehdi Foroozandeh Shahraki¹, Marjan Farahbod¹, Maxwell W Libbrecht²

Affiliations

¹ School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada.
² School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada [email protected].

Abstract

With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as the predominant way to summarize these epigenomic data sets in order to annotate the genome. These chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of chromatin state assignments. Here, we propose the first method for assigning calibrated confidence scores to chromatin state annotations. Toward this goal, we performed a comprehensive evaluation of the reproducibility of the two most widely used existing SAGA methods, ChromHMM and Segway. We found that their predictions are frequently irreproducible. For example, when applying the same SAGA method on two sets of experimental replicates, 27%-69% of predicted enhancers fail to replicate. This suggests that a substantial fraction of predicted elements in existing chromatin state annotations cannot be relied upon. To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (r-value) to chromatin state annotations. SAGAconf works with any SAGA method and assigns an r-value to each genomic bin of a chromatin state annotation that represents the probability that the label of this bin will be reproduced in a replicated experiment. Thus, SAGAconf allows a researcher to select only the reliable predictions from a chromatin annotation for use in downstream analyses.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Chromatin* / genetics
Chromatin* / metabolism
Genomics / methods
Humans
Molecular Sequence Annotation*
Reproducibility of Results

Substances

Chromatin