Given the promise of rich biological information in microarray data we will expect an increasing demand for a robust, practical and well-tested methodology to provide patient prognosis based on gene expression data. In standard settings, with few clinical predictors, such a methodology has been provided by the Cox proportional hazard model, but no corresponding methodology is available to deal with the full set of genes in microarray data. Furthermore, we want the procedure to be able to deal with the general survival data that include censored information. Conceptually such a procedure can be constructed quite easily, but its implementation will never be straightforward due to computational problems. We have developed an approach that relies on an extension of the Cox proportional likelihood that allows random effects parameters. In this approach, we use the full set of genes in the analysis and deal with survival data in the most general way. We describe the development of the model and the steps in the implementation, including a fast computational formula based on a subsampling of the risk set and the singular value decomposition. Finally, we illustrate the methodology using a data set obtained from a cohort of breast cancer patients.
Copyright 2004 John Wiley & Sons, Ltd.