Weighted clustering of called array CGH data

Biostatistics. 2008 Jul;9(3):484-500. doi: 10.1093/biostatistics/kxm048. Epub 2007 Dec 22.

Abstract

Array comparative genomic hybridization (aCGH) is a laboratory technique to measure chromosomal copy number changes. A clear biological interpretation of the measurements is obtained by mapping these onto an ordinal scale with categories loss/normal/gain of a copy. The pattern of gains and losses harbors a level of tumor specificity. Here, we present WECCA (weighted clustering of called aCGH data), a method for weighted clustering of samples on the basis of the ordinal aCGH data. Two similarities to be used in the clustering and particularly suited for ordinal data are proposed, which are generalized to deal with weighted observations. In addition, a new form of linkage, especially suited for ordinal data, is introduced. In a simulation study, we show that the proposed cluster method is competitive to clustering using the continuous data. We illustrate WECCA using an application to a breast cancer data set, where WECCA finds a clustering that relates better with survival than the original one.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / genetics
  • Chromosome Mapping
  • Cluster Analysis*
  • Computer Simulation
  • Cytogenetic Analysis / methods*
  • Cytogenetic Analysis / statistics & numerical data
  • DNA, Neoplasm / analysis
  • Discriminant Analysis
  • Female
  • Fuzzy Logic
  • Gene Dosage*
  • Gene Expression Profiling / methods
  • Gene Expression Profiling / statistics & numerical data
  • Genetic Markers
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods
  • Survival Analysis
  • Weights and Measures*

Substances

  • DNA, Neoplasm
  • Genetic Markers