Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S--a novel motif scan algorithm with optional secondary structure constraints

Proteins. 2008 May 1;71(2):631-40. doi: 10.1002/prot.21777.

Abstract

Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Base Sequence
  • Chemical Phenomena
  • Chemistry, Physical
  • Databases, Protein
  • Deoxyribonucleases, Type II Site-Specific / metabolism*
  • Protein Structure, Secondary
  • Sequence Alignment
  • Software
  • Substrate Specificity

Substances

  • Deoxyribonucleases, Type II Site-Specific