pyRBDome: a comprehensive computational platform for enhancing RNA-binding proteome data

Liang-Cui Chu; Niki Christopoulou; Hugh McCaughan; Sophie Winterbourne; Davide Cazzola; Shichao Wang; Ulad Litvin; Salomé Brunon; Patrick Jb Harker; Iain McNae; Sander Granneman

doi:10.26508/lsa.202402787

pyRBDome: a comprehensive computational platform for enhancing RNA-binding proteome data

Life Sci Alliance. 2024 Jul 30;7(10):e202402787. doi: 10.26508/lsa.202402787. Print 2024 Oct.

Authors

Liang-Cui Chu^{1

2}, Niki Christopoulou^{1

2}, Hugh McCaughan^{1

2}, Sophie Winterbourne², Davide Cazzola¹, Shichao Wang^{1

2}, Ulad Litvin^{1

3}, Salomé Brunon^{1

4}, Patrick Jb Harker^{1

5}, Iain McNae², Sander Granneman^{6

2}

Affiliations

¹ Centre for Engineering Biology, University of Edinburgh, Edinburgh, UK.
² Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK.
³ MRC-University of Glasgow Centre for Virus Research, Glasgow, UK.
⁴ Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Paris, France.
⁵ Cancer Research UK Cancer Biomarker Centre, University of Manchester, Manchester, UK.
⁶ Centre for Engineering Biology, University of Edinburgh, Edinburgh, UK [email protected].

Abstract

High-throughput proteomics approaches have revolutionised the identification of RNA-binding proteins (RBPome) and RNA-binding sequences (RBDome) across organisms. Yet, the extent of noise, including false positives, associated with these methodologies, is difficult to quantify as experimental approaches for validating the results are generally low throughput. To address this, we introduce pyRBDome, a pipeline for enhancing RNA-binding proteome data in silico. It aligns the experimental results with RNA-binding site (RBS) predictions from distinct machine-learning tools and integrates high-resolution structural data when available. Its statistical evaluation of RBDome data enables quick identification of likely genuine RNA-binders in experimental datasets. Furthermore, by leveraging the pyRBDome results, we have enhanced the sensitivity and specificity of RBS detection through training new ensemble machine-learning models. pyRBDome analysis of a human RBDome dataset, compared with known structural data, revealed that although UV-cross-linked amino acids were more likely to contain predicted RBSs, they infrequently bind RNA in high-resolution structures. This discrepancy underscores the limitations of structural data as benchmarks, positioning pyRBDome as a valuable alternative for increasing confidence in RBDome datasets.

MeSH terms

Binding Sites
Computational Biology* / methods
Databases, Protein
Humans
Machine Learning*
Protein Binding
Proteome* / metabolism
Proteomics* / methods
RNA* / chemistry
RNA* / metabolism
RNA-Binding Proteins* / chemistry
RNA-Binding Proteins* / metabolism
Software

Substances

Proteome
RNA-Binding Proteins
RNA

Associated data

PDB/O00425
PDB/6GX6
PDB/4un3
PDB/https://www.rcsb.org/structure/4un3
PDB/https://www.rcsb.org/structure/4AM3
PDB/4AM3

Grants and funding

WT_/Wellcome Trust/United Kingdom