Identification of genes and molecular pathways with congruent profiles in the proteomic and transcriptomic datasets may result in the discovery of promising transcriptomic biomarkers that would be more relevant to phenotypic changes. In this study, we conducted comparative analysis of 943 paired RNA and proteomic profiles obtained for the same samples of seven human cancer types from The Cancer Genome Atlas (TCGA) and NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) [two major open human cancer proteomic and transcriptomic databases] that included 15,112 protein-coding genes and 1611 molecular pathways. Overall, our findings demonstrated statistically significant improvement of the congruence between RNA and proteomic profiles when performing analysis at the level of molecular pathways rather than at the level of individual gene products. Transition to the molecular pathway level of data analysis increased the correlation to 0.19-0.57 (Pearson) and 0.14-057 (Spearman), or 2-3-fold for some cancer types. Evaluating the gain of the correlation upon transition to the data analysis the pathway level can be used to refine the omics data by identifying outliers that can be excluded from the comparison of RNA and proteomic profiles. We suggest using sample- and gene-wise correlations for individual genes and molecular pathways as a measure of quality of RNA/protein paired molecular data. We also provide a database of human genes, molecular pathways, and samples related to the correlation between RNA and protein products to facilitate an exploration of new cancer transcriptomic biomarkers and molecular mechanisms at different levels of human gene expression.
Keywords: cancer genomics; high-throughput analysis of human gene expression; pathway activation level; proteomics; transcriptomics.