Mammalian microRNA prediction through a support vector machine model of sequence and structure

PLoS One. 2007 Sep 26;2(9):e946. doi: 10.1371/journal.pone.0000946.

Abstract

Background: MicroRNAs (miRNAs) are endogenous small noncoding RNA gene products, on average 22 nt long, found in a wide variety of organisms. They play important regulatory roles by targeting mRNAs for degradation or translational repression. There are 377 known mouse miRNAs and 475 known human miRNAs in the May 2007 release of the miRBase database, the majority of which are conserved between the two species. A number of recent reports imply that it is likely that many mammalian miRNAs remain to be discovered. The possibility that there are more of them expressed at lower levels or in more specialized expression contexts calls for the exploitation of genome sequence information to accelerate their discovery.

Methodology/principal findings: In this article, we describe a computational method-mirCoS-that uses three support vector machine models sequentially to discover new miRNA candidates in mammalian genomes based on sequence, secondary structure, and conservation. mirCoS can efficiently detect the majority of known miRNAs and predicts an extensive set of hairpin structures based on human-mouse comparisons. In total, 3476 mouse candidates and 3441 human candidates were found. These hairpins are more similar to known miRNAs than to negative controls in several aspects not considered by the prediction algorithm. A significant fraction of predictions is supported by existing expression evidence.

Conclusions/significance: Using a novel approach, mirCoS performs comparably to or better than existing miRNA prediction methods, and contributes a significant number of new candidate miRNAs for experimental verification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Computational Biology / methods*
  • Gene Expression Regulation
  • Genetic Vectors
  • Humans
  • Mammals / genetics*
  • Mice
  • MicroRNAs / genetics*
  • MicroRNAs / metabolism
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Sequence Analysis, RNA

Substances

  • MicroRNAs
  • RNA, Messenger