Discovering large conserved functional components in global network alignment by graph matching

BMC Genomics. 2018 Sep 24;19(Suppl 7):670. doi: 10.1186/s12864-018-5027-9.

Abstract

Background: Aligning protein-protein interaction (PPI) networks is very important to discover the functionally conserved sub-structures between different species. In recent years, the global PPI network alignment problem has been extensively studied aiming at finding the one-to-one alignment with the maximum matching score. However, finding large conserved components remains challenging due to its NP-hardness.

Results: We propose a new graph matching method GMAlign for global PPI network alignment. It first selects some pairs of important proteins as seeds, followed by a gradual expansion to obtain an initial matching, and then it refines the current result to obtain an optimal alignment result iteratively based on the vertex cover. We compare GMAlign with the state-of-the-art methods on the PPI network pairs obtained from the largest BioGRID dataset and validate its performance. The results show that our algorithm can produce larger size of alignment, and can find bigger and denser common connected subgraphs as well for the first time. Meanwhile, GMAlign can achieve high quality biological results, as measured by functional consistency and semantic similarity of the Gene Ontology terms. Moreover, we also show that GMAlign can achieve better results which are structurally and biologically meaningful in the detection of large conserved biological pathways between species.

Conclusions: GMAlign is a novel global network alignment tool to discover large conserved functional components between PPI networks. It also has many potential biological applications such as conserved pathway and protein complex discovery across species. The GMAlign software and datasets are avaialbile at https://github.com/yzlwhu/GMAlign .

Keywords: Graph matching; Graph theory; Protein-protein interaction network.

MeSH terms

  • Algorithms*
  • Animals
  • Computational Biology / methods*
  • Computer Graphics
  • Gene Ontology
  • Gene Regulatory Networks*
  • Humans
  • Models, Theoretical
  • Protein Interaction Mapping*
  • Proteins / genetics
  • Proteins / metabolism*

Substances

  • Proteins