PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer

Bioinformatics. 2006 Sep 15;22(18):2269-75. doi: 10.1093/bioinformatics/btl174. Epub 2006 May 8.

Abstract

Motivation: Elucidating the molecular taxonomy of cancers and finding biological and clinical markers from microarray experiments is problematic due to the large number of variables being measured. Feature selection methods that can identify relevant classifiers or that can remove likely false positives prior to supervised analysis are therefore desirable.

Results: We present a novel feature selection procedure based on a mixture model and a non-gaussianity measure of a gene's expression profile. The method can be used to find genes that define either small outlier subgroups or major subdivisions, depending on the sign of kurtosis. The method can also be used as a filtering step, prior to supervised analysis, in order to reduce the false discovery rate. We validate our methodology using six independent datasets by rediscovering major classifiers in ER negative and ER positive breast cancer and in prostate cancer. Furthermore, our method finds two novel subtypes within the basal subgroup of ER negative breast tumours, associated with apoptotic and immune response functions respectively, and with statistically different clinical outcome.

Availability: An R-function pack that implements the methods used here has been added to vabayelMix, available from (www.cran.r-project.org).

Contact: [email protected]

Supplementary information: Supplementary information is available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence
  • Biomarkers, Tumor / analysis*
  • Breast Neoplasms / diagnosis
  • Breast Neoplasms / metabolism*
  • Cluster Analysis
  • Diagnosis, Computer-Assisted / methods
  • Female
  • Gene Expression Profiling / methods
  • Humans
  • Male
  • Neoplasm Proteins / analysis*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods
  • Prostatic Neoplasms / diagnosis
  • Prostatic Neoplasms / metabolism*
  • Software*

Substances

  • Biomarkers, Tumor
  • Neoplasm Proteins