Integrative analysis of high-throughput cancer studies with contrasted penalization

Xingjie Shi; Jin Liu; Jian Huang; Yong Zhou; BenChang Shia; Shuangge Ma

doi:10.1002/gepi.21781

Integrative analysis of high-throughput cancer studies with contrasted penalization

Genet Epidemiol. 2014 Feb;38(2):144-51. doi: 10.1002/gepi.21781. Epub 2014 Jan 6.

Authors

Xingjie Shi¹, Jin Liu, Jian Huang, Yong Zhou, BenChang Shia, Shuangge Ma

Affiliation

¹ Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America; School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.

Abstract

In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms "classic" meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance.

Keywords: contrasted penalization; high-throughput cancer studies; integrative analysis; marker selection.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Breast Neoplasms / diagnosis
Breast Neoplasms / genetics
Computer Simulation
Female
Genetic Markers
Humans
Lung Neoplasms / diagnosis
Lung Neoplasms / genetics
Models, Genetic
Neoplasms / diagnosis
Neoplasms / genetics*
Prognosis

Substances

Genetic Markers

Abstract

Publication types

MeSH terms

Substances

Grants and funding