Predicting binding sites from unbound versus bound protein structures

Sci Rep. 2020 Sep 28;10(1):15856. doi: 10.1038/s41598-020-72906-7.

Abstract

We present the application of seven binding-site prediction algorithms to a meticulously curated dataset of ligand-bound and ligand-free crystal structures for 304 unique protein sequences (2528 crystal structures). We probe the influence of starting protein structures on the results of binding-site prediction, so the dataset contains a minimum of two ligand-bound and two ligand-free structures for each protein. We use this dataset in a brief survey of five geometry-based, one energy-based, and one machine-learning-based methods: Surfnet, Ghecom, LIGSITEcsc, Fpocket, Depth, AutoSite, and Kalasanty. Distributions of the F scores and Matthew's correlation coefficients for ligand-bound versus ligand-free structure performance show no statistically significant difference in structure type versus performance for most methods. Only Fpocket showed a statistically significant but low magnitude enhancement in performance for holo structures. Lastly, we found that most methods will succeed on some crystal structures and fail on others within the same protein family, despite all structures being relatively high-quality structures with low structural variation. We expected better consistency across varying protein conformations of the same sequence. Interestingly, the success or failure of a given structure cannot be predicted by quality metrics such as resolution, Cruickshank Diffraction Precision index, or unresolved residues. Cryptic sites were also examined.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Apoproteins / chemistry
  • Apoproteins / metabolism
  • Binding Sites
  • Computational Biology*
  • Databases, Protein
  • Proteins / chemistry*
  • Proteins / metabolism*

Substances

  • Apoproteins
  • Proteins