Synonymous constraint elements show a tendency to encode intrinsically disordered protein segments

PLoS Comput Biol. 2014 May 8;10(5):e1003607. doi: 10.1371/journal.pcbi.1003607. eCollection 2014 May.

Abstract

Synonymous constraint elements (SCEs) are protein-coding genomic regions with very low synonymous mutation rates believed to carry additional, overlapping functions. Thousands of such potentially multi-functional elements were recently discovered by analyzing the levels and patterns of evolutionary conservation in human coding exons. These elements provide a good opportunity to improve our understanding of how the redundant nature of the genetic code is exploited in the cell. Our premise is that the protein segments encoded by such elements might better comply with the increased functional demands if they are structurally less constrained (i.e. intrinsically disordered). To test this idea, we investigated the protein segments encoded by SCEs with computational tools to describe the underlying structural properties. In addition to SCEs, we examined the level of disorder, secondary structure, and sequence complexity of protein regions overlapping with experimentally validated splice regulatory sites. We show that multi-functional gene regions translate into protein segments that are significantly enriched in structural disorder and compositional bias, while they are depleted in secondary structure and domain annotations compared to reference segments of similar lengths. This tendency suggests that relaxed protein structural constraints provide an advantage when accommodating multiple overlapping functions in coding regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Base Sequence
  • Computer Simulation
  • Humans
  • Intrinsically Disordered Proteins / chemistry*
  • Intrinsically Disordered Proteins / genetics*
  • Intrinsically Disordered Proteins / ultrastructure
  • Models, Chemical*
  • Models, Genetic*
  • Models, Molecular*
  • Molecular Sequence Data
  • Open Reading Frames / genetics*
  • Structure-Activity Relationship

Substances

  • Intrinsically Disordered Proteins

Grants and funding

This work was supported by the Research Foundation Flanders (FWO) Odysseus grant G.0029.12 to PT; and by a fellowship from the Mexican National Council for Science and Technology (CONACYT) with reference 215503/310852 to MMC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.