The properties of high-dimensional data spaces: implications for exploring gene and protein expression data

Robert Clarke; Habtom W Ressom; Antai Wang; Jianhua Xuan; Minetta C Liu; Edmund A Gehan; Yue Wang

doi:10.1038/nrc2294

The properties of high-dimensional data spaces: implications for exploring gene and protein expression data

Nat Rev Cancer. 2008 Jan;8(1):37-49. doi: 10.1038/nrc2294.

Authors

Robert Clarke¹, Habtom W Ressom, Antai Wang, Jianhua Xuan, Minetta C Liu, Edmund A Gehan, Yue Wang

Affiliation

¹ Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University School of Medicine, 3970 Reservoir Road NW, Washington, DC 20057, USA.

PMID: 18097463
PMCID: PMC2238676
DOI: 10.1038/nrc2294

Abstract

High-throughput genomic and proteomic technologies are widely used in cancer research to build better predictive models of diagnosis, prognosis and therapy, to identify and characterize key signalling networks and to find new targets for drug development. These technologies present investigators with the task of extracting meaningful statistical and biological information from high-dimensional data spaces, wherein each sample is defined by hundreds or thousands of measurements, usually concurrently obtained. The properties of high dimensionality are often poorly understood or overlooked in data modelling and analysis. From the perspective of translational science, this Review discusses the properties of high-dimensional data spaces that arise in genomic and proteomic studies and the challenges they can pose for data analysis and interpretation.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Review

MeSH terms

Gene Expression Regulation*
Genes*
Genome
Humans
Models, Genetic
Models, Statistical
Neoplasms / classification
Neoplasms / genetics*
Prognosis
Proteins / genetics*
Proteome
Transcription, Genetic

Substances

Proteins
Proteome

Abstract

Publication types

MeSH terms

Substances

Grants and funding