We describe a computational approach for finding genes that are functionally related but do not possess any noticeable sequence similarity. Our method, which we call SNAP (similarity-neighborhood approach), reveals the conservation of gene order on bacterial chromosomes based on both cross-genome comparison and context information. The novel feature of this method is that it does not rely on detection of conserved colinear gene strings. Instead, we introduce the notion of a similarity-neighborhood graph (SN-graph), which is constructed from the chains of similarity and neighborhood relationships between orthologous genes in different genomes and adjacent genes in the same genome, respectively. An SN-cycle is defined as a closed path on the SN-graph and is postulated to preferentially join functionally related gene products that participate in the same biochemical or regulatory process. We demonstrate the substantial non-randomness and functional significance of SN-cycles derived from real genome data and estimate the prediction accuracy of SNAP in assigning broad function to uncharacterized proteins. Examples of practical application of SNAP for improving the quality of genome annotation are described.
Copyright 2001 Academic Press.