Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families

PLoS Comput Biol. 2021 Apr 15;17(4):e1008798. doi: 10.1371/journal.pcbi.1008798. eCollection 2021 Apr.

Abstract

Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein's structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Models, Molecular*
  • Protein Conformation
  • Proteins / chemistry*

Substances

  • Proteins

Grants and funding

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 823886. AE is funded by grants from the Swedish Natural Science Research Council (Vetenskapsrådet) No VR-NT 2016-03798. SNIC provided computational resources under grant agreement No SNIC 2020/5-300. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.