Averaged gene expressions for regression

Mee Young Park; Trevor Hastie; Robert Tibshirani

doi:10.1093/biostatistics/kxl002

Averaged gene expressions for regression

Biostatistics. 2007 Apr;8(2):212-27. doi: 10.1093/biostatistics/kxl002. Epub 2006 May 11.

Authors

Mee Young Park¹, Trevor Hastie, Robert Tibshirani

Affiliation

¹ Google Inc, Mountain View, CA 94043, USA. [email protected]

PMID: 16698769
DOI: 10.1093/biostatistics/kxl002

Abstract

Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the challenge of having far more features than samples. In this paper, we introduce a two-step procedure that combines (1) hierarchical clustering and (2) Lasso. By averaging the genes within the clusters obtained from hierarchical clustering, we define supergenes and use them to fit regression models, thereby attaining concise interpretation and accuracy. Our methods are supported with theoretical justifications and demonstrated on simulated and real data sets.

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Breast Neoplasms / genetics
Cluster Analysis*
Computer Simulation
Female
Gene Expression Profiling / methods*
Humans
Models, Statistical*
Oligonucleotide Array Sequence Analysis / methods*
Regression Analysis*

Abstract

Publication types

MeSH terms

Grants and funding