The uncovering of protein-RNA interactions enables a deeper understanding of RNA processing. Recent multiplexed crosslinking and immunoprecipitation (CLIP) technologies such as antibody-barcoded eCLIP (ABC) dramatically increase the throughput of mapping RNA binding protein (RBP) binding sites. However, multiplex CLIP datasets are multivariate, and each RBP suffers non-uniform signal-to-noise ratio. To address this, we developed Mudskipper, a versatile computational suite comprising two components: a Dirichlet multinomial mixture model to account for the multivariate nature of ABC datasets and a softmasking approach that identifies and removes non-specific protein-RNA interactions in RBPs with low signal-to-noise ratio. Mudskipper demonstrates superior precision and recall over existing tools on multiplex datasets and supports analysis of repetitive elements and small non-coding RNAs. Our findings unravel splicing outcomes and variant-associated disruptions, enabling higher-throughput investigations into diseases and regulation mediated by RBPs.
Keywords: CLIP; RNA; RNA-binding proteins; deep learning; gene regulation; splicing; transcriptomics; variant interpretation.
Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.