Sequence-based prediction of microRNA-binding residues in proteins using cost-sensitive Laplacian support vector machines

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):752-9. doi: 10.1109/TCBB.2013.75.

Abstract

The recognition of microRNA (miRNA)-binding residues in proteins is helpful to understand how miRNAs silence their target genes. It is difficult to use existing computational method to predict miRNA-binding residues in proteins due to the lack of training examples. To address this issue, unlabeled data may be exploited to help construct a computational model. Semisupervised learning deals with methods for exploiting unlabeled data in addition to labeled data automatically to improve learning performance, where no human intervention is assumed. In addition, miRNA-binding proteins almost always contain a much smaller number of binding than nonbinding residues, and cost-sensitive learning has been deemed as a good solution to the class imbalance problem. In this work, a novel model is proposed for recognizing miRNA-binding residues in proteins from sequences using a cost-sensitive extension of Laplacian support vector machines (CS-LapSVM) with a hybrid feature. The hybrid feature consists of evolutionary information of the amino acid sequence (position-specific scoring matrices), the conservation information about three biochemical properties (HKM) and mutual interaction propensities in protein-miRNA complex structures. The CS-LapSVM receives good performance with an F1 score of 26.23 ± 2.55% and an AUC value of 0.805 ± 0.020 superior to existing approaches for the recognition of RNA-binding residues. A web server called SARS is built and freely available for academic usage.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites*
  • Databases, Protein
  • MicroRNAs / chemistry
  • MicroRNAs / metabolism*
  • Position-Specific Scoring Matrices
  • Protein Binding
  • Proteins / chemistry
  • Proteins / metabolism*
  • Sequence Analysis, Protein / methods*
  • Support Vector Machine*

Substances

  • MicroRNAs
  • Proteins