Integration of biological data via NMF for identification of human disease-associated gene modules through multi-label classification

PLoS One. 2024 Dec 12;19(12):e0305503. doi: 10.1371/journal.pone.0305503. eCollection 2024.

Abstract

Proteins associated with multiple diseases often interact, forming disease modules that are critical for understanding disease mechanisms. This study integrates protein-protein interactions (PPIs) and Gene Ontology data using non-negative matrix factorization (NMF) to identify gene modules associated with human diseases. We leverage two biological sources of information, protein-protein interactions (PPIs) and Gene Ontology data, to find connections between novel genes and diseases. The data sources are first converted into networks, which are then clustered to obtain modules. Two types of modules are then integrated through an NMF-based technique to obtain a set of meta-modules that preserve the essential characteristics of interaction patterns and functional similarity information among the proteins/genes. Each meta-module is labeled based on its statistical and biological properties, and a multi-label classification technique is employed to assign new disease labels to genes. We identified 3,131 gene-disease associations, validated through a literature review, Gene Ontology, and pathway analysis.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Disease / classification
  • Disease / genetics
  • Gene Ontology*
  • Gene Regulatory Networks
  • Genetic Predisposition to Disease
  • Humans
  • Protein Interaction Mapping
  • Protein Interaction Maps / genetics

Grants and funding

The author(s) received no specific funding for this work.