Several extracellular modular proteins, including proteases of the complement and blood coagulation cascades, are shown here to exhibit conserved sequence patterns specific for a particular module-domain association. This was detected by comparative analysis of sequence variability in different multiple sequence alignments, which provides a new tool to investigate the evolution of modular proteins. A first example deals with the proteins featuring a common complement control protein (CCP) module-serine protease (SP) domain pattern at their C-terminal end, defined here as the CCP-SP sub-family. These proteins include the complement proteases C1r, C1s and MASPs, the Limulus clotting factor C, and the proteins of the haptoglobin family. A second example deals with blood coagulation factors VII, IX and X and protein C, all featuring a common epidermal growth factor (EGF)-SP C-terminal assembly. Highly specific motifs are found at the connection between the CCP or EGF module and the activation peptide of the SP domain: [P/A]-x-C-x-[P/A]-[I/V]-C-G-x-[P/S/K] in the case of the CCP-SP proteins, and C-x-[P/S]-x-x-x-[Y/F]-P-C-G in the case of the EGF-SP proteins. Each motif is strictly conserved in the whole sub-family and it is detected in no more than one other known protein sequence. Strikingly, most of the conserved residues specific to each sub-family appear to be clustered at the interface between the SP domain and the CCP or EGF module. We propose that a rigid module-domain interaction occurs in these proteins and has been conserved through evolution. The functional implications of these assemblies, underlined by such evolutionary constraints, are discussed.
Copyright 1998 Academic Press.