Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM

Genomics. 2011 Aug;98(2):73-8. doi: 10.1016/j.ygeno.2011.04.011. Epub 2011 May 14.

Abstract

MicroRNAs (miRNAs) are non-coding RNAs that play important roles in post-transcriptional regulation. Identification of miRNAs is crucial to understanding their biological mechanism. Recently, machine-learning approaches have been employed to predict miRNA precursors (pre-miRNAs). However, features used are divergent and consequently induce different performance. Thus, feature selection is critical for pre-miRNA prediction. We generated an optimized feature subset including 13 features using a hybrid of genetic algorithm and support vector machine (GA-SVM). Based on SVM, the classification performance of the optimized feature subset is much higher than that of the two feature sets used in microPred and miPred by five-fold cross-validation. Finally, we constructed the classifier miR-SF to predict the most recently identified human pre-miRNAs in miRBase (version 16). Compared with microPred and miPred, miR-SF achieved much higher classification performance. Accuracies were 93.97%, 86.21% and 64.66% for miR-SF, microPred and miPred, respectively. Thus, miR-SF is effective for identifying pre-miRNAs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Base Sequence
  • Computational Biology
  • Databases, Nucleic Acid
  • Humans
  • MicroRNAs / genetics*
  • Molecular Sequence Data
  • Nucleic Acid Conformation
  • RNA Precursors / genetics*
  • RNA, Small Interfering / genetics
  • Sequence Analysis, RNA / methods*

Substances

  • MicroRNAs
  • RNA Precursors
  • RNA, Small Interfering