Support vector machine classification and validation of cancer tissue samples using microarray expression data

T S Furey; N Cristianini; N Duffy; D W Bednarski; M Schummer; D Haussler

doi:10.1093/bioinformatics/16.10.906

Support vector machine classification and validation of cancer tissue samples using microarray expression data

Bioinformatics. 2000 Oct;16(10):906-14. doi: 10.1093/bioinformatics/16.10.906.

Authors

T S Furey¹, N Cristianini, N Duffy, D W Bednarski, M Schummer, D Haussler

Affiliation

¹ Department of Computer Science, University of California, Santa Cruz, Santa Cruz, CA 95064, USA. [email protected]

PMID: 11120680
DOI: 10.1093/bioinformatics/16.10.906

Abstract

Motivation: DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. We have developed a new method to analyse this kind of data using support vector machines (SVMs). This analysis consists of both classification of the tissue samples, and an exploration of the data for mis-labeled or questionable tissue results.

Results: We demonstrate the method in detail on samples consisting of ovarian cancer tissues, normal ovarian tissues, and other normal tissues. The dataset consists of expression experiment results for 97,802 cDNAs for each tissue. As a result of computational analysis, a tissue sample is discovered and confirmed to be wrongly labeled. Upon correction of this mistake and the removal of an outlier, perfect classification of tissues is achieved, but not with high confidence. We identify and analyse a subset of genes from the ovarian dataset whose expression is highly differentiated between the types of tissues. To show robustness of the SVM method, two previously published datasets from other types of tissues or cells are analysed. The results are comparable to those previously obtained. We show that other machine learning methods also perform comparably to the SVM on many of those datasets.

Availability: The SVM software is available at http://www.cs. columbia.edu/ approximately bgrundy/svm.

Publication types

Evaluation Study

MeSH terms

Acute Disease
Algorithms*
Artificial Intelligence
Colonic Neoplasms / classification*
Colonic Neoplasms / genetics
Colonic Neoplasms / pathology
DNA, Neoplasm / analysis*
Databases, Factual*
Female
Humans
Leukemia, Myeloid / classification*
Leukemia, Myeloid / genetics
Leukemia, Myeloid / pathology
Oligonucleotide Array Sequence Analysis / methods*
Ovarian Neoplasms / classification*
Ovarian Neoplasms / genetics
Ovarian Neoplasms / pathology
Ovary / pathology
Precursor Cell Lymphoblastic Leukemia-Lymphoma / classification*
Precursor Cell Lymphoblastic Leukemia-Lymphoma / genetics
Precursor Cell Lymphoblastic Leukemia-Lymphoma / pathology
Software

Substances

DNA, Neoplasm