Exploring the roles of cannot-link constraint in community detection via Multi-variance Mixed Gaussian Generative Model

PLoS One. 2017 Jul 5;12(7):e0178029. doi: 10.1371/journal.pone.0178029. eCollection 2017.

Abstract

Due to the demand for performance improvement and the existence of prior information, semi-supervised community detection with pairwise constraints becomes a hot topic. Most existing methods have been successfully encoding the must-link constraints, but neglect the opposite ones, i.e., the cannot-link constraints, which can force the exclusion between nodes. In this paper, we are interested in understanding the role of cannot-link constraints and effectively encoding pairwise constraints. Towards these goals, we define an integral generative process jointly considering the network topology, must-link and cannot-link constraints. We propose to characterize this process as a Multi-variance Mixed Gaussian Generative (MMGG) Model to address diverse degrees of confidences that exist in network topology and pairwise constraints and formulate it as a weighted nonnegative matrix factorization problem. The experiments on artificial and real-world networks not only illustrate the superiority of our proposed MMGG, but also, most importantly, reveal the roles of pairwise constraints. That is, though the must-link is more important than cannot-link when either of them is available, both must-link and cannot-link are equally important when both of them are available. To the best of our knowledge, this is the first work on discovering and exploring the importance of cannot-link constraints in semi-supervised community detection.

MeSH terms

  • Algorithms*
  • Community Networks*
  • Humans
  • Information Dissemination / methods
  • Models, Theoretical*
  • Multivariate Analysis*
  • Normal Distribution*
  • Social Behavior
  • Social Media / statistics & numerical data
  • Social Networking