Organization of gene programs revealed by unsupervised analysis of diverse gene-trait associations

Nucleic Acids Res. 2022 Aug 26;50(15):e87. doi: 10.1093/nar/gkac413.

Abstract

Genome wide association studies provide statistical measures of gene-trait associations that reveal how genetic variation influences phenotypes. This study develops an unsupervised dimensionality reduction method called UnTANGLeD (Unsupervised Trait Analysis of Networks from Gene Level Data) which organizes 16,849 genes into discrete gene programs by measuring the statistical association between genetic variants and 1,393 diverse complex traits. UnTANGLeD reveals 173 gene clusters enriched for protein-protein interactions and highly distinct biological processes governing development, signalling, disease, and homeostasis. We identify diverse gene networks with robust interactions but not associated with known biological processes. Analysis of independent disease traits shows that UnTANGLeD gene clusters are conserved across all complex traits, providing a simple and powerful framework to predict novel gene candidates and programs influencing orthogonal disease phenotypes. Collectively, this study demonstrates that gene programs co-ordinately orchestrating cell functions can be identified without reliance on prior knowledge, providing a method for use in functional annotation, hypothesis generation, machine learning and prediction algorithms, and the interpretation of diverse genomic data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Disease / genetics
  • Gene Regulatory Networks*
  • Genome-Wide Association Study* / methods
  • Genomics / methods
  • Phenotype
  • Polymorphism, Single Nucleotide