Synonymous constraint elements show a tendency to encode intrinsically disordered protein segments

Mauricio Macossay-Castillo; Simone Kosol; Peter Tompa; Rita Pancsa

doi:10.1371/journal.pcbi.1003607

Synonymous constraint elements show a tendency to encode intrinsically disordered protein segments

PLoS Comput Biol. 2014 May 8;10(5):e1003607. doi: 10.1371/journal.pcbi.1003607. eCollection 2014 May.

Authors

Mauricio Macossay-Castillo¹, Simone Kosol¹, Peter Tompa², Rita Pancsa¹

Affiliations

¹ Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium.
² Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium; Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary.

Abstract

Synonymous constraint elements (SCEs) are protein-coding genomic regions with very low synonymous mutation rates believed to carry additional, overlapping functions. Thousands of such potentially multi-functional elements were recently discovered by analyzing the levels and patterns of evolutionary conservation in human coding exons. These elements provide a good opportunity to improve our understanding of how the redundant nature of the genetic code is exploited in the cell. Our premise is that the protein segments encoded by such elements might better comply with the increased functional demands if they are structurally less constrained (i.e. intrinsically disordered). To test this idea, we investigated the protein segments encoded by SCEs with computational tools to describe the underlying structural properties. In addition to SCEs, we examined the level of disorder, secondary structure, and sequence complexity of protein regions overlapping with experimentally validated splice regulatory sites. We show that multi-functional gene regions translate into protein segments that are significantly enriched in structural disorder and compositional bias, while they are depleted in secondary structure and domain annotations compared to reference segments of similar lengths. This tendency suggests that relaxed protein structural constraints provide an advantage when accommodating multiple overlapping functions in coding regions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Base Sequence
Computer Simulation
Humans
Intrinsically Disordered Proteins / chemistry*
Intrinsically Disordered Proteins / genetics*
Intrinsically Disordered Proteins / ultrastructure
Models, Chemical*
Models, Genetic*
Models, Molecular*
Molecular Sequence Data
Open Reading Frames / genetics*
Structure-Activity Relationship

Substances

Intrinsically Disordered Proteins

Grants and funding

This work was supported by the Research Foundation Flanders (FWO) Odysseus grant G.0029.12 to PT; and by a fellowship from the Mexican National Council for Science and Technology (CONACYT) with reference 215503/310852 to MMC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.