A Probabilistic Approach in the Search Space of the Molecular Distance Geometry Problem

J Chem Inf Model. 2025 Jan 13;65(1):427-434. doi: 10.1021/acs.jcim.4c00427. Epub 2024 Nov 13.

Abstract

The discovery of the three-dimensional shape of protein molecules using interatomic distance information from nuclear magnetic resonance (NMR) can be modeled as a discretizable molecular distance geometry problem (DMDGP). Due to its combinatorial characteristics, the problem is conventionally solved in the literature as a depth-first search in a binary tree. In this work, we introduce a new search strategy, which we call frequency-based search (FBS), that for the first time utilizes geometric information contained in the protein data bank (PDB). We encode the geometric configurations of 14,382 molecules derived from NMR experiments present in the PDB into binary strings. The obtained results show that the sample space of the binary strings extracted from the PDB does not follow a uniform distribution. Furthermore, we compare the runtime of the symmetry-based build-Up (SBBU) algorithm (the most efficient method in the literature to solve the DMDGP) combined with FBS and the depth-first search (DFS) in finding a solution, ascertaining that FBS performs better in about 70% of the cases.

MeSH terms

  • Algorithms*
  • Databases, Protein*
  • Magnetic Resonance Spectroscopy / methods
  • Models, Molecular
  • Nuclear Magnetic Resonance, Biomolecular
  • Probability
  • Protein Conformation
  • Proteins* / chemistry

Substances

  • Proteins