Statistical analyses of genome sequence-derived protein sequence data can identify amino acid residues that interact between proteins or between domains of a protein. These statistical methods are based on evolution-directed amino acid variation responding to structural and functional constraints in proteins. The identified residues form a basis for determining structure and folding of proteins as well as inferring mechanisms of protein function. When applied to two-component systems, several research groups have shown they can be used to identify the amino acid interactions between response regulators and histidine kinases and the specificity therein. Recently, statistical studies between the HisKA and HATPase-ATP-binding domains of histidine kinases identified amino acid interactions for both the inactive and the active catalytic states of such kinases. The identified interactions generated a model structure for the domain conformation of the active state. This conformation requires an unwinding of a portion of the C-terminal helix of the HisKA domain that destroys the inactive state residue contacts and suggests how signal-binding determines the equilibrium between the inactive and active states of histidine kinases. The rapidly accumulating protein sequence databases from genome, metagenome and microbiome studies are an important resource for functional and structural understanding of proteins and protein complexes in microbes.
© 2012 Blackwell Publishing Ltd.