Cancer cells acquire genetic and epigenetic alterations that often lead to dysregulation of oncogenic signal transduction pathways, which in turn alters downstream transcriptional programs. Numerous methods attempt to deduce aberrant signaling pathways in tumors from mRNA data alone, but these pathway analysis approaches remain qualitative and imprecise. In this study, we present a statistical method to link upstream signaling to downstream transcriptional response by exploiting reverse phase protein array (RPPA) and mRNA expression data in The Cancer Genome Atlas (TCGA) breast cancer project. Formally, we use an algorithm called affinity regression to learn an interaction matrix between upstream signal transduction proteins and downstream transcription factors (TFs) that explains target gene expression. The trained model can then predict the TF activity, given a tumor sample's protein expression profile, or infer the signaling protein activity, given a tumor sample's gene expression profile. Breast cancers are comprised of molecularly distinct subtypes that respond differently to pathway-targeted therapies. We trained our model on the TCGA breast cancer data set and identified subtype-specific and common TF regulators of gene expression. We then used the trained tumor model to predict signaling protein activity in a panel of breast cancer cell lines for which gene expression and drug response data was available. Correlations between inferred protein activities and drug responses in breast cancer cell lines grouped several drugs that are clinically used in combination. Finally, inferred protein activity predicted the clinical outcome within the METABRIC Luminal A cohort, identifying high- and low-risk patient groups within this heterogeneous subtype.
© 2014 Osmanbeyoglu et al.; Published by Cold Spring Harbor Laboratory Press.