Interactive data analysis and clustering of genomic data

Neural Netw. 2008 Mar-Apr;21(2-3):368-78. doi: 10.1016/j.neunet.2007.12.026. Epub 2007 Dec 31.

Abstract

In this work a new clustering approach is used to explore a well- known dataset [Whitfield, M. L., Sherlock, G., Saldanha, A. J., Murray, J. I., Ball, C. A., Alexander, K. E., et al. (2002). Molecular biology of the cell: Vol. 13. Identification of genes periodically expressed in the human cell cycle and their expression in tumors (pp. 1977-2000)] of time dependent gene expression profiles in human cell cycle. The approach followed by us is realized with a multi-step procedure: after preprocessing, parameters are chosen by using data sub sampling and stability measures; for any used model, several different clustering solutions are obtained by random initialization and are selected basing on a similarity measure and a figure of merit; finally the selected solutions are tuned by evaluating a reliability measure. Three different models for clustering, K-means, Self-organizing Maps and Probabilistic Principal Surfaces are compared. Comparative analysis is carried out by considering: similarity between best solutions obtained through the three methods, absolute distortion value and validation through the use of Gene Ontology (GO) annotations. The GO annotations are used to give significance to the obtained clusters and to compare the results with those obtained in the work cited above.

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Cell Cycle / genetics
  • Cluster Analysis*
  • Gene Expression Profiling*
  • Genome*
  • Humans
  • Pattern Recognition, Automated
  • Statistics as Topic*