High-density DNA microarrays measure expression of large numbers of genes in one assay. The ability to find underlying structure in complex gene expression data sets and rigorously test association of that structure with biological conditions is essential to developing multi-faceted views of the gene activity that defines cellular phenotype. We sought to connect features of gene expression data with biological hypotheses by integrating 'metagene' patterns from DNA microarray experiments in the characterization and prediction of oncogenic phenotypes. We applied these techniques to the analysis of regulatory pathways controlled by the genes HRAS (Harvey rat sarcoma viral oncogene homolog), MYC (myelocytomatosis viral oncogene homolog) and E2F1, E2F2 and E2F3 (encoding E2F transcription factors 1, 2 and 3, respectively). The phenotypic models accurately predict the activity of these pathways in the context of normal cell proliferation. Moreover, the metagene models trained with gene expression patterns evoked by ectopic production of Myc or Ras proteins in primary tissue culture cells properly predict the activity of in vivo tumor models that result from deregulation of the MYC or HRAS pathways. We conclude that these gene expression phenotypes have the potential to characterize the complex genetic alterations that typify the neoplastic state, whether in vitro or in vivo, in a way that truly reflects the complexity of the regulatory pathways that are affected.