A hierarchical approach to aligning collinear regions of genomes

Bioinformatics. 2002 Dec;18(12):1673-80. doi: 10.1093/bioinformatics/18.12.1673.

Abstract

Motivation: As a first approximation, similarity between two long orthologous regions of genomes can be represented by a chain of local similarities. Within such a chain, pairs of successive similarities are collinear (non-conflicting), i.e. segments involved in the nth similarity precede in both sequences segments involved in the (n+1)th similarity. However, when all similarities between two long sequences are considered, usually there are many conflicts between them. Although some conflicts can be avoided by masking transposons or low-complexity sequences, selecting only those similarities that reflect orthology and, thus, belong to the evolutionarily true chain is not trivial.

Results: We propose a simple, hierarchical algorithm of finding the true chain of local similarities. Starting from similarities with low P-values, we resolve each pairwise conflict by deleting a similarity with a higher P-value. This greedy approach constructs a chain of similarities faster than when a chain optimal with respect to some global criterion is sought, and makes more sense biologically.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Base Sequence
  • Chickens / genetics
  • Conserved Sequence / genetics
  • Evolution, Molecular
  • Fractals
  • Genome*
  • Humans
  • Mice
  • Molecular Sequence Data
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid
  • Software
  • Takifugu / genetics