The availability of large-scale repositories and integrated cancer genome efforts have created unprecedented opportunities to study and describe cancer biology. In this sense, the aim of translational researchers is the integration of multiple omics data to achieve a better identification of homogeneous subgroups of patients in order to develop adequate diagnostic and treatment strategies from the personalized medicine perspective. So far, existing integrative methods have grouped together omics data information, leaving out individual omics data phenotypic interpretation. Here, we present the Massive and Integrative Gene Set Analysis (MIGSA) R package. This tool can analyze several high throughput experiments in a comprehensive way through a functional analysis strategy, relating a phenotype to its biological function counterpart defined by means of gene sets. By simultaneously querying different multiple omics data from the same or different groups of patients, common and specific functional patterns for each studied phenotype can be obtained. The usefulness of MIGSA was demonstrated by applying the package to functionally characterize the intrinsic breast cancer PAM50 subtypes. For each subtype, specific functional transcriptomic profiles and gene sets enriched by transcriptomic and proteomic data were identified. To achieve this, transcriptomic and proteomic data from 28 datasets were analyzed using MIGSA. As a result, enriched gene sets and important genes were consistently found as related to a specific subtype across experiments or data types and thus can be used as molecular signature biomarkers.
Keywords: Big omics data; Biological insight; Breast cancer; Functional analysis; Knowledge discovery; Multiple omics.
Copyright © 2019 Elsevier Inc. All rights reserved.