Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2119-2130. doi: 10.1109/TCBB.2019.2917452. Epub 2020 Dec 8.

Abstract

De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence
  • Computational Biology / methods*
  • Models, Molecular
  • Mutation / genetics
  • Protein Conformation*
  • Proteins / chemistry*
  • Proteins / genetics

Substances

  • Proteins