Background: Signal transduction pathways convey information from the outside of the cell to transcription factors, which in turn regulate gene expression. Our objective is to analyze tumor gene expression data from microarrays in the context of such pathways.
Results: We use pathways compiled from the TRANSPATH/TRANSFAC databases and the literature, and three publicly available cancer microarray data sets. Variation in pathway activity, across the samples, is gauged by the degree of correlation between downstream targets of a pathway. Two correlation scores are applied; one considers all pairs of downstream targets, and the other considers only pairs without common transcription factors. Several pathways are found to be differentially active in the data sets using these scores. Moreover, we devise a score for pathway activity in individual samples, based on the average expression value of the downstream targets. Statistical significance is assigned to the scores using permutation of genes as null model. Hence, for individual samples, the status of a pathway is given as a sign, + or -, and a p-value. This approach defines a projection of high-dimensional gene expression data onto low-dimensional pathway activity scores. For each dataset and many pathways we find a much larger number of significant samples than expected by chance. Finally, we find that several sample-wise pathway activities are significantly associated with clinical classifications of the samples.
Conclusion: This study shows that it is feasible to infer signal transduction pathway activity, in individual samples, from gene expression data. Furthermore, these pathway activities are biologically relevant in the three cancer data sets.