Classification of ligand molecules in PDB with graph match-based structural superposition

J Struct Funct Genomics. 2016 Dec;17(4):135-146. doi: 10.1007/s10969-016-9209-x. Epub 2016 Dec 23.

Abstract

The fast heuristic graph match algorithm for small molecules, COMPLIG, was improved by adding a structural superposition process to verify the atom-atom matching. The modified method was used to classify the small molecule ligands in the Protein Data Bank (PDB) by their three-dimensional structures, and 16,660 types of ligands in the PDB were classified into 7561 clusters. In contrast, a classification by a previous method (without structure superposition) generated 3371 clusters from the same ligand set. The characteristic feature in the current classification system is the increased number of singleton clusters, which contained only one ligand molecule in a cluster. Inspections of the singletons in the current classification system but not in the previous one implied that the major factors for the isolation were differences in chirality, cyclic conformations, separation of substructures, and bond length. Comparisons between current and previous classification systems revealed that the superposition-based classification was effective in clustering functionally related ligands, such as drugs targeted to specific biological processes, owing to the strictness of the atom-atom matching.

Keywords: Bioinformatics; Drug design; Graph match; Protein ligand; Structural superposition.

MeSH terms

  • Algorithms*
  • Binding Sites
  • Cluster Analysis
  • Databases, Protein*
  • Ligands
  • Models, Molecular
  • Protein Conformation*
  • Proteins / chemistry

Substances

  • Ligands
  • Proteins