Using Gaussian model to improve biological sequence comparison

J Comput Chem. 2010 Jan 30;31(2):351-61. doi: 10.1002/jcc.21322.

Abstract

One of the major tasks in biological sequence analysis is to compare biological sequences, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations among the sequences. Numerous efficient methods have been developed for sequence comparison, but challenges remain. In this article, we proposed a novel method to compare biological sequences based on Gaussian model. Instead of comparing the frequencies of k-words in biological sequences directly, we considered the k-word frequency distribution under Gaussian model which gives the different expression levels of k-words. The proposed method was tested by similarity search, evaluation on functionally related genes, and phylogenetic analysis. The performance of our method was further compared with alignment-based and alignment-free methods. The results demonstrate that Gaussian model provides more information about k-word frequencies and improves the efficiency of sequence comparison.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Coronavirus / classification
  • Coronavirus / genetics
  • Genome / genetics
  • Models, Biological
  • Models, Statistical*
  • Normal Distribution
  • Phylogeny
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*