Shallow Sparsely-Connected Autoencoders for Gene Set Projection

Maxwell P Gold; Alexander LeNail; Ernest Fraenkel

Shallow Sparsely-Connected Autoencoders for Gene Set Projection

Pac Symp Biocomput. 2019:24:374-385.

Authors

Maxwell P Gold¹, Alexander LeNail¹, Ernest Fraenkel¹

Affiliation

¹ Department of Biological Engineering, Massachusetts Institute of Technology, 21 Ames St. Cambridge, MA, 02139, USA.

PMID: 30963076
PMCID: PMC6417803

Abstract

When analyzing biological data, it can be helpful to consider gene sets, or predefined groups of biologically related genes. Methods exist for identifying gene sets that are differential between conditions, but large public datasets from consortium projects and single-cell RNA-Sequencing have opened the door for gene set analysis using more sophisticated machine learning techniques, such as autoencoders and variational autoencoders. We present shallow sparsely-connected autoencoders (SSCAs) and variational autoencoders (SSCVAs) as tools for projecting gene-level data onto gene sets. We tested these approaches on single-cell RNA-Sequencing data from blood cells and on RNA-Sequencing data from breast cancer patients. Both SSCA and SSCVA can recover known biological features from these datasets and the SSCVA method often outperforms SSCA (and six existing gene set scoring algorithms) on classification and prediction tasks.

Keywords: autoencoder; gene set; single-cell RNA-Sequencing; variational autoencoder.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Blood Cells / metabolism
Breast Neoplasms / genetics
Breast Neoplasms / mortality
Computational Biology
Female
Gene Expression Profiling / statistics & numerical data*
Gene Regulatory Networks*
Humans
Neural Networks, Computer
Sequence Analysis, RNA / statistics & numerical data*
Single-Cell Analysis / statistics & numerical data
Supervised Machine Learning
Survival Analysis

Abstract

Publication types

MeSH terms

Grants and funding