RNA.DNA:DNA triple helix (triplex) formation is a form of RNA-DNA interaction which regulates gene expression but is difficult to study experimentally in vivo. This makes accurate computational prediction of such interactions highly important in the field of RNA research. Current predictive methods use canonical Hoogsteen base pairing rules, which whilst biophysically valid, may not reflect the plastic nature of cell biology. Here, we present the first optimization approach to learn a probabilistic model describing RNA-DNA interactions directly from motifs derived from triplex sequencing data. We find that there are several stable interaction codes, including Hoogsteen base pairing and novel RNA-DNA base pairings, which agree with in vitro measurements. We implemented these findings in TriplexAligner, a program that uses the determined interaction codes to predict triplex binding. TriplexAligner predicts RNA-DNA interactions identified in all-to-all sequencing data more accurately than all previously published tools in human and mouse and also predicts previously studied triplex interactions with known regulatory functions. We further validated a novel triplex interaction using biophysical experiments. Our work is an important step towards better understanding of triplex formation and allows genome-wide analyses of RNA-DNA interactions.
Keywords: DNA; RNA; RNA–DNA interaction; Triplex; machine learning.
© The Author(s) 2022. Published by Oxford University Press.