The 106 small molecule metabolic (SMM) pathways in Escherichia coli are formed by the protein products of 581 genes. We can define 722 domains, nearly all of which are homologous to proteins of known structure, that form all or part of 510 of these proteins. This information allows us to answer general questions on the structural anatomy of the SMM pathway proteins and to trace family relationships and recruitment events within and across pathways. Half the gene products contain a single domain and half are formed by combinations of between two and six domains. The 722 domains belong to one of 213 families that have between one and 51 members. Family members usually conserve their catalytic or cofactor binding properties; substrate recognition is rarely conserved. Of the 213 families, members of only a quarter occur in isolation, i.e. they form single-domain proteins. Most members of the other families combine with domains from just one or two other families and a few more versatile families can combine with several different partners. Excluding isoenzymes, more than twice as many homologues are distributed across pathways as within pathways. However, serial recruitment, with two consecutive enzymes both being recruited to another pathway, is rare and recruitment of three consecutive enzymes is not observed. Only eight of the 106 pathways have a high number of homologues. Homology between consecutive pairs of enzymes with conservation of the main substrate-binding site but change in catalytic mechanism (which would support a simple model of retrograde pathway evolution) occurs only six times in the whole set of enzymes. Most of the domains that form SMM pathways have homologues in non-SMM pathways. Taken together, these results imply a pervasive "mosaic" model for the formation of protein repertoires and pathways.
Copyright 2001 Academic Press.