Investigating Deep Learning Based Breast Cancer Subtyping Using Pan-Cancer and Multi-Omic Data

Francisco Cristovao; Silvia Cascianelli; Arif Canakoglu; Mark Carman; Luca Nanni; Pietro Pinoli; Marco Masseroli

doi:10.1109/TCBB.2020.3042309

Investigating Deep Learning Based Breast Cancer Subtyping Using Pan-Cancer and Multi-Omic Data

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):121-134. doi: 10.1109/TCBB.2020.3042309. Epub 2022 Feb 3.

Authors

Francisco Cristovao, Silvia Cascianelli, Arif Canakoglu, Mark Carman, Luca Nanni, Pietro Pinoli, Marco Masseroli

PMID: 33270566
DOI: 10.1109/TCBB.2020.3042309

Abstract

Breast Cancer comprises multiple subtypes implicated in prognosis. Existing stratification methods rely on the expression quantification of small gene sets. Next Generation Sequencing promises large amounts of omic data in the next years. In this scenario, we explore the potential of machine learning and, particularly, deep learning for breast cancer subtyping. Due to the paucity of publicly available data, we leverage on pan-cancer and non-cancer data to design semi-supervised settings. We make use of multi-omic data, including microRNA expressions and copy number alterations, and we provide an in-depth investigation of several supervised and semi-supervised architectures. Obtained accuracy results show simpler models to perform at least as well as the deep semi-supervised approaches on our task over gene expression data. When multi-omic data types are combined together, performance of deep models shows little (if any) improvement in accuracy, indicating the need for further analysis on larger datasets of multi-omic data as and when they become available. From a biological perspective, our linear model mostly confirms known gene-subtype annotations. Conversely, deep approaches model non-linear relationships, which is reflected in a more varied and still unexplored set of representative omic features that may prove useful for breast cancer subtyping.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Breast Neoplasms* / genetics
DNA Copy Number Variations
Deep Learning*
Female
Humans
Machine Learning
Supervised Machine Learning