Why do eukaryotic proteins contain more intrinsically disordered regions?

PLoS Comput Biol. 2019 Jul 22;15(7):e1007186. doi: 10.1371/journal.pcbi.1007186. eCollection 2019 Jul.

Abstract

Intrinsic disorder is more abundant in eukaryotic than prokaryotic proteins. Methods predicting intrinsic disorder are based on the amino acid sequence of a protein. Therefore, there must exist an underlying difference in the sequences between eukaryotic and prokaryotic proteins causing the (predicted) difference in intrinsic disorder. By comparing proteins, from complete eukaryotic and prokaryotic proteomes, we show that the difference in intrinsic disorder emerges from the linker regions connecting Pfam domains. Eukaryotic proteins have more extended linker regions, and in addition, the eukaryotic linkers are significantly more disordered, 38% vs. 12-16% disordered residues. Next, we examined the underlying reason for the increase in disorder in eukaryotic linkers, and we found that the changes in abundance of only three amino acids cause the increase. Eukaryotic proteins contain 8.6% serine; while prokaryotic proteins have 6.5%, eukaryotic proteins also contain 5.4% proline and 5.3% isoleucine compared with 4.0% proline and ≈ 7.5% isoleucine in the prokaryotes. All these three differences contribute to the increased disorder in eukaryotic proteins. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. The differences are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. The observation that differences in the abundance of three amino acids cause the difference in disorder between eukaryotic and prokaryotic proteins raises the question: Are amino acid frequencies different in eukaryotic linkers because the linkers are more disordered or do the differences cause the increased disorder?

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry
  • Animals
  • Computational Biology
  • Databases, Protein
  • Eukaryotic Cells / metabolism*
  • Evolution, Molecular
  • Humans
  • Intrinsically Disordered Proteins / chemistry*
  • Intrinsically Disordered Proteins / genetics
  • Intrinsically Disordered Proteins / metabolism*
  • Isoleucine / chemistry
  • Prokaryotic Cells / metabolism*
  • Proline / chemistry
  • Protein Domains
  • Selection, Genetic
  • Serine / chemistry

Substances

  • Amino Acids
  • Intrinsically Disordered Proteins
  • Isoleucine
  • Serine
  • Proline

Grants and funding

This work was supported by grants from the Swedish Research Council (www.vr.se) (VR-NT 2016-03798 to AE) and Swedish e-Science Research Center (www.e-science.se). The Swedish National Infrastructure provided computational resources for Computing (www.snics.se). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.