A major mode of gene regulation occurs via the binding of specific proteins to specific DNA sequences. The availability of complete bacterial genome sequences offers an unprecedented opportunity to describe networks of such interactions by correlating existing experimental data with computational predictions. Of the 240 candidate Escherichia coli DNA-binding proteins, about 55 have DNA-binding sites identified by DNA footprinting. We used these sites to construct recognition matrices, which we used to search for additional binding sites in the E. coli genomic sequence. Many of these matrices show a strong preference for non-coding DNA. Discrepancies are identified between matrices derived from natural sites and those derived from SELEX (Systematic Evolution of Ligands by Exponential enrichment) experiments. We have constructed a database of these proteins and binding sites, called DPInteract (available at http://arep.med.harvard.edu/dpinteract).
Copyright 1998 Academic Press.