Pattern discovery of multivariate phenotypes by association rule mining and its scheme for genome-wide association studies

Int J Data Min Bioinform. 2012;6(5):505-20.

Abstract

Genome-wide association studies (GWAS) have served crucial roles in investigating disease susceptible loci for single traits. On the other hand, GWAS have been limited in measuring genetic risk factors for multivariate phenotypes from pleiotropic genetic effects of genetic loci. This work reports a data mining approach to discover patterns of multivariate phenotypes expressed as association rules, and presents an analytical scheme for GWAS of those newly defined multivariate phenotypes. We identified 13 SNPs for four genes (CSMD1, NFE2L1, CBX1, and SKAP1) associated with a new multivariate phenotype defined as low levels of low density lipoprotein cholesterol (LDL-C < or = 100 mg/dl) and high levels of triglycerides (TG > or = 180 mg/dl). Compared with a traditional approach to GWAS, the use of discovered multivariate phenotypes can be advantageous in identifying pleiotropic genetic risk factors, which may have a common etiological role for the multivariate phenotypes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromobox Protein Homolog 5
  • Genetic Loci
  • Genome, Human
  • Genome-Wide Association Study / methods*
  • Humans
  • Phenotype*
  • Polymorphism, Single Nucleotide
  • Triglycerides / genetics

Substances

  • CBX1 protein, human
  • Triglycerides
  • Chromobox Protein Homolog 5