Structure and non-structure of centrosomal proteins

PLoS One. 2013 May 9;8(5):e62633. doi: 10.1371/journal.pone.0062633. Print 2013.

Abstract

Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Centrosome / metabolism*
  • Databases, Protein*
  • Gene Expression
  • Humans
  • Molecular Sequence Data
  • Protein Binding
  • Protein Folding
  • Protein Isoforms / chemistry
  • Protein Isoforms / genetics
  • Protein Isoforms / metabolism
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / metabolism*
  • Proteome / chemistry
  • Proteome / genetics
  • Proteome / metabolism*
  • Signal Transduction

Substances

  • Protein Isoforms
  • Proteins
  • Proteome

Grants and funding

The authors gratefully acknowledge financial support from the Spanish Ministry of Science, grant CSD2006-00023, and from the Madrid Community, grant S2010/BMD-2305. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.