A log likelihood predictor for genomic classification of oral cancer using principle component analysis for feature selection

Mark E Whipple; Eduardo Mendez; D Gregory Farwell; S Nicholas Agoff; Chu Chen

A log likelihood predictor for genomic classification of oral cancer using principle component analysis for feature selection

Stud Health Technol Inform. 2004;107(Pt 2):823-6.

Authors

Mark E Whipple¹, Eduardo Mendez, D Gregory Farwell, S Nicholas Agoff, Chu Chen

Affiliation

¹ Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, WA 98105, USA. [email protected]

PMID: 15360927

Abstract

DNA microarrays are powerful tools for exploring gene expression and predicting disease state. However, since the number of variables (genes) typically exceeds the number of samples (tissue specimens), many potentially spurious genes may be selected for a predictor function. Principle component analysis (PCA) can greatly reduce the high-dimensional microarray data space while retaining most of the inherent variability. We propose a methodology that uses PCA to identify a predictor vector between two mutually exclusive and collectively exhaustive classes. By projecting the training set upon this vector a distribution of projections can be computed for each class. A log-likelihood ratio is then calculated for class membership. We used this methodology to classify 48 biopsy specimens as either oral squamous cell carcinoma or normal oral mucosa using oligonucleotide microarrays. The system was trained using a set of half the samples, and correctly predicted the membership of the other half. The three most highly positively and three most highly negative predictive genes were all keratins that are known markers of squamous cell carcinoma.

MeSH terms

Biopsy
Carcinoma, Squamous Cell / genetics*
Carcinoma, Squamous Cell / pathology
Gene Expression Profiling
Genetic Markers
Humans
Keratins / genetics*
Linear Models
Mouth Mucosa / cytology
Mouth Mucosa / pathology
Mouth Neoplasms / genetics*
Mouth Neoplasms / pathology
Oligonucleotide Array Sequence Analysis*
Principal Component Analysis*
RNA, Messenger / analysis

Substances

Genetic Markers
RNA, Messenger
Keratins