Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling

Methods Mol Biol. 2023:2627:25-40. doi: 10.1007/978-1-0716-2974-1_2.

Abstract

Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.

Keywords: Ab initio modeling; CAMEO; CASP; Contact prediction; Deep learning; Homology modeling; Protein folding; Protein structure prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Databases, Protein
  • Machine Learning*
  • Protein Conformation
  • Protein Folding
  • Protein Structure, Tertiary
  • Proteins* / chemistry
  • Sequence Analysis, Protein / methods

Substances

  • Proteins