Exploration of Genomic, Proteomic, and Histopathological Image Data Integration Methods for Clinical Prediction

IEEE China Summit Int Conf Signal Inf Process. 2013 Jul:2013:259-263. doi: 10.1109/ChinaSIP.2013.6625340. Epub 2013 Oct 10.

Abstract

The emergence of large multi-platform and multi-scale data repositories in biomedicine has enabled the exploration of data integration for holistic decision making. In this research, we investigate multi-modal genomic, proteomic, and histopathological image data integration for prediction of ovarian cancer clinical endpoints in The Cancer Genome Atlas (TCGA). Specifically, we study two data integration techniques, simple data concatenation and ensemble classification, to determine whether they can improve prediction of ovarian cancer grade or patient survival. Results indicate that integration via ensemble classification is more effective than simple data concatenation. We also highlight several key factors impacting data integration outcome such as predictability of endpoint, class prevalence, and unbalanced representation of features from different data modalities.