RNA-binding proteins (RBPs) are critical components of post-transcriptional gene expression regulation. However, their binding sites have until recently been difficult to determine due to the apparent low specificity of RBPs for their target transcripts and the lack of high-throughput assays for analyzing binding sites genome wide. Here we present a bioinformatics method for predicting RBP binding motif sites on a genome-wide scale that leverages motif conservation, RNA secondary structure, and the tendency of RBP binding sites to cluster together. A probabilistic model is learned from bona fide binding sites determined by CLIP and applied genome wide to generate high specificity binding site predictions.
Keywords: Binding site prediction; CLIP tag clusters; RNA-binding protein (RBP); mCarts; motif.