Specific alignment of structured RNA: stochastic grammars and sequence annealing

Bioinformatics. 2008 Dec 1;24(23):2677-83. doi: 10.1093/bioinformatics/btn495. Epub 2008 Sep 16.

Abstract

Motivation: Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative probabilistic model of sequence and structure with an efficient sequence annealing approach for exploring the space of multiple alignments. This leads to a new software program, Stemloc-AMA, that is both accurate and specific in the alignment of multiple related RNA sequences.

Results: When tested on the benchmark datasets BRalibase II and BRalibase 2.1, Stemloc-AMA has comparable sensitivity to and better specificity than the best competing methods. We use a large-scale random sequence experiment to show that while most alignment programs maximize sensitivity at the expense of specificity, even to the point of giving complete alignments of non-homologous sequences, Stemloc-AMA aligns only sequences with detectable homology and leaves unrelated sequences largely unaligned. Such accurate and specific alignments are crucial for comparative-genomics analysis, from inferring phylogeny to estimating substitution rates across different lineages.

Availability: Stemloc-AMA is available from http://biowiki.org/StemLocAMA as part of the dart software package for sequence analysis.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Base Sequence
  • Computational Biology / methods
  • Genome
  • Genomics
  • RNA, Untranslated / chemistry*
  • Sequence Alignment / methods*
  • Sequence Analysis, RNA / methods*
  • Software

Substances

  • RNA, Untranslated