The rapid evolution of fertilization proteins has generated remarkable diversity in molecular structure and function. Glycoproteins of vertebrate egg coats contain multiple zona pellucida (ZP)-N domains (1-6 copies) that facilitate multiple reproductive functions, including species-specific sperm recognition. In this report, we integrate phylogenetics and machine learning to investigate how ZP-N domains diversify in structure and function. The most C-terminal ZP-N domain of each paralog is associated with another domain type (ZP-C), which together form a "ZP module." All modular ZP-N domains are phylogenetically distinct from nonmodular or free ZP-N domains. Machine learning-based classification identifies eight residues that form a stabilizing network in modular ZP-N domains that is absent in free domains. Positive selection is identified in some free ZP-N domains. Our findings support that strong purifying selection has conserved an essential structural core in modular ZP-N domains, with the relaxation of this structural constraint allowing free N-terminal domains to functionally diversify.
Keywords: fertilization; gene duplication; machine learning; molecular evolution; phylogenetics; protein structure.
© The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.