Simultaneous class discovery and classification of microarray data using spectral analysis

Peng Qiu; Sylvia K Plevritis

doi:10.1089/cmb.2008.0227

Simultaneous class discovery and classification of microarray data using spectral analysis

J Comput Biol. 2009 Jul;16(7):935-44. doi: 10.1089/cmb.2008.0227.

Authors

Peng Qiu¹, Sylvia K Plevritis

Affiliation

¹ Department of Radiology, Stanford University, Stanford, California 94305, USA. [email protected]

Abstract

Classification methods are commonly divided into two categories: unsupervised and supervised. Unsupervised methods have the ability to discover new classes by grouping data into clusters or tree structures without using the class labels, but they carry the risk of producing noninterpretable results. On the other hand, supervised methods always find decision rules that discriminate samples with different class labels. However, the class label information plays such an important role that it confines supervised methods by defining the possible classes. Consequently, supervised methods do not have the ability to discover new classes. To overcome the limitations of unsupervised and supervised methods, we propose a new method, which utilizes the class labels to a less important role so as to perform class discovery and classification simultaneously. The proposed method is called SPACC (SPectral Analysis for Class discovery and Classification). In SPACC, the training samples are nodes of an undirected weighted network. Using spectral analysis, SPACC iteratively partitions the network into a top-down binary tree. Each partitioning step is unsupervised, and the class labels are only used to define the stopping criterion. When the partitioning ends, the training samples have been divided into several subsets, each corresponding to one class label. Because multiple subsets can correspond to the same class label, SPACC may identify biologically meaningful subclasses, and minimize the impact of outliers and mislabeled data. We demonstrate the effectiveness of SPACC for class discovery and classification on microarray data of lymphomas and leukemias. SPACC software is available at http://icbp.stanford.edu/software/SPACC/.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Animals
Gene Expression Profiling / methods*
Gene Expression Regulation*
Humans
Leukemia / metabolism*
Lymphoma / metabolism*
Oligonucleotide Array Sequence Analysis / methods*
Software*

Grants and funding

U56 CA112973/CA/NCI NIH HHS/United States