Pattern matching between two non-aligned random sequences

Bull Math Biol. 1994 Nov;56(6):1143-62. doi: 10.1007/BF02460290.

Abstract

Given two independent sequences of letters, we seek the probability distribution of the length of the longest matching word. This word can be in different positions in the two sequences and we consider both perfect and nearly perfect matching. We derive bounds and approximations for the probability and compare them with other bounds and approximations. The results can be applied to DNA sequences in molecular biology and generalized matching between two independent random sequences.

MeSH terms

  • Base Sequence*
  • DNA / chemistry*
  • Mathematics*
  • Models, Statistical*
  • Pattern Recognition, Automated*
  • Probability
  • Random Allocation

Substances

  • DNA