ZoomQA: residue-level protein model accuracy estimation with machine learning on sequential and 3D structural features

Brief Bioinform. 2022 Jan 17;23(1):bbab384. doi: 10.1093/bib/bbab384.

Abstract

Motivation: The Estimation of Model Accuracy problem is a cornerstone problem in the field of Bioinformatics. As of CASP14, there are 79 global QA methods, and a minority of 39 residue-level QA methods with very few of them working on protein complexes. Here, we introduce ZoomQA, a novel, single-model method for assessing the accuracy of a tertiary protein structure/complex prediction at residue level, which have many applications such as drug discovery. ZoomQA differs from others by considering the change in chemical and physical features of a fragment structure (a portion of a protein within a radius $r$ of the target amino acid) as the radius of contact increases. Fourteen physical and chemical properties of amino acids are used to build a comprehensive representation of every residue within a protein and grade their placement within the protein as a whole. Moreover, we have shown the potential of ZoomQA to identify problematic regions of the SARS-CoV-2 protein complex.

Results: We benchmark ZoomQA on CASP14, and it outperforms other state-of-the-art local QA methods and rivals state of the art QA methods in global prediction metrics. Our experiment shows the efficacy of these new features and shows that our method is able to match the performance of other state-of-the-art methods without the use of homology searching against databases or PSSM matrices.

Availability: http://zoomQA.renzhitech.com.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Caspases / chemistry*
  • Humans
  • Machine Learning*
  • Models, Molecular*
  • Protein Structure, Quaternary
  • Protein Structure, Tertiary
  • SARS-CoV-2 / chemistry*
  • Sequence Analysis, Protein
  • Viral Proteins / chemistry*

Substances

  • Viral Proteins
  • CASP14 protein, human
  • Caspases