A FRAMEWORK FOR ATTRIBUTE-BASED COMMUNITY DETECTION WITH APPLICATIONS TO INTEGRATED FUNCTIONAL GENOMICS

Pac Symp Biocomput. 2016:21:69-80.

Abstract

Understanding community structure in networks has received considerable attention in recent years. Detecting and leveraging community structure holds promise for understanding and potentially intervening with the spread of influence. Network features of this type have important implications in a number of research areas, including, marketing, social networks, and biology. However, an overwhelming majority of traditional approaches to community detection cannot readily incorporate information of node attributes. Integrating structural and attribute information is a major challenge. We propose a exible iterative method; inverse regularized Markov Clustering (irMCL), to network clustering via the manipulation of the transition probability matrix (aka stochastic flow) corresponding to a graph. Similar to traditional Markov Clustering, irMCL iterates between "expand" and "inflate" operations, which aim to strengthen the intra-cluster flow, while weakening the inter-cluster flow. Attribute information is directly incorporated into the iterative method through a sigmoid (logistic function) that naturally dampens attribute influence that is contradictory to the stochastic flow through the network. We demonstrate advantages and the exibility of our approach using simulations and real data. We highlight an application that integrates breast cancer gene expression data set and a functional network defined via KEGG pathways reveal significant modules for survival.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Breast Neoplasms / genetics
  • Cluster Analysis
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • Computer Simulation
  • Female
  • Gene Expression Profiling / statistics & numerical data
  • Gene Regulatory Networks
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Humans
  • Logistic Models
  • Markov Chains
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Signal Transduction / genetics
  • Stochastic Processes
  • Systems Integration