CASP10-BCL::Fold efficiently samples topologies of large proteins

Proteins. 2015 Mar;83(3):547-63. doi: 10.1002/prot.24733.

Abstract

During CASP10 in summer 2012, we tested BCL::Fold for prediction of free modeling (FM) and template-based modeling (TBM) targets. BCL::Fold assembles the tertiary structure of a protein from predicted secondary structure elements (SSEs) omitting more flexible loop regions early on. This approach enables the sampling of conformational space for larger proteins with more complex topologies. In preparation of CASP11, we analyzed the quality of CASP10 models throughout the prediction pipeline to understand BCL::Fold's ability to sample the native topology, identify native-like models by scoring and/or clustering approaches, and our ability to add loop regions and side chains to initial SSE-only models. The standout observation is that BCL::Fold sampled topologies with a GDT_TS score > 33% for 12 of 18 and with a topology score > 0.8 for 11 of 18 test cases de novo. Despite the sampling success of BCL::Fold, significant challenges still exist in clustering and loop generation stages of the pipeline. The clustering approach employed for model selection often failed to identify the most native-like assembly of SSEs for further refinement and submission. It was also observed that for some β-strand proteins model refinement failed as β-strands were not properly aligned to form hydrogen bonds removing otherwise accurate models from the pool. Further, BCL::Fold samples frequently non-natural topologies that require loop regions to pass through the center of the protein.

Keywords: de novo protein structure prediction; double blind benchmark; knowledge based scoring functions; loop prediction; sheet alignment.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation
  • Models, Molecular
  • Protein Conformation
  • Protein Folding*
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins