Accurate computational prediction of the transcribed strand of CRISPR non-coding RNAs

Bioinformatics. 2014 Jul 1;30(13):1805-13. doi: 10.1093/bioinformatics/btu114. Epub 2014 Feb 27.

Abstract

Motivation: CRISPR RNAs (crRNAs) are a type of small non-coding RNA that form a key part of an acquired immune system in prokaryotes. Specific prediction methods find crRNA-encoding loci in nearly half of sequenced bacterial, and three quarters of archaeal, species. These Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) arrays consist of repeat elements alternating with specific spacers. Generally one strand is transcribed, producing long pre-crRNAs, which are processed to short crRNAs that base pair with invading nucleic acids to facilitate their destruction. No current software for the discovery of CRISPR loci predicts the direction of crRNA transcription.

Results: We have developed an algorithm that accurately predicts the strand of the resulting crRNAs. The method uses as input CRISPR repeat predictions. CRISPRDirection uses parameters that are calculated from the CRISPR repeat predictions and flanking sequences, which are combined by weighted voting. The prediction may use prior coding sequence annotation but this is not required. CRISPRDirection correctly predicted the orientation of 94% of a reference set of arrays.

Availability and implementation: The Perl source code is freely available from http://bioanalysis.otago.ac.nz/CRISPRDirection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Archaea / genetics
  • Clustered Regularly Interspaced Short Palindromic Repeats*
  • Nucleic Acid Conformation
  • RNA, Untranslated / genetics*

Substances

  • RNA, Untranslated