Membrane proteins often possess critical structural features, such as transmembrane domains (TMs), N-glycosylation, and disulfide bonds (SS bonds), which are essential to their structure and function. Here, we extend the study of the motifs carrying N-glycosylation, i.e. the sequons, and the Cys residues supporting the SS bonds, to the whole human proteome with a particular focus on the Cys positions in human proteins with respect to those of sequons and TMs. As the least abundant amino acid residue in protein sequences, the positions of Cys residues in proteins are not random but rather selected through evolution. We discovered that the frequency of Cys residues in proteins is length dependent, and the frequency of CC gaps formed between adjacent Cys residues can be used as a classifier to distinguish proteins with special structures and functions, such as keratin-associated proteins (KAPs), extracellular proteins with EGF-like domains, and nuclear proteins with zinc finger C2H2 domains. Most importantly, by comparing the positions of Cys residues to those of sequons and TMs, we discovered that these structural features can form dense clusters in highly repeated and mutually exclusive modalities in protein sequences. The evolutionary advantages of such complementarity among the three structural features are discussed, particularly in light of structural dynamics in proteins that are lacking from computational predictions. The discoveries made here highlight the sequence-structure-function axis in biological organisms that can be utilized in future protein engineering toward synthetic biology.
Keywords: Cysteine residues; Disulfide bonds; N-glycosylation; Posttranslational modifications; Protein sequence; Protein structure and function; Sequons; Transmembrane domains.
© 2024. The Author(s).