DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays

Zhanlin Chen; Jing Zhang; Jason Liu; Yi Dai; Donghoon Lee; Martin Renqiang Min; Min Xu; Mark Gerstein

doi:10.1093/bioinformatics/btab283

DECODE: a Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays

Bioinformatics. 2021 Jul 12;37(Suppl_1):i280-i288. doi: 10.1093/bioinformatics/btab283.

Authors

Zhanlin Chen¹, Jing Zhang², Jason Liu³, Yi Dai², Donghoon Lee⁴, Martin Renqiang Min⁵, Min Xu⁶, Mark Gerstein^{1

3

7}

Affiliations

¹ Department of Statistics & Data Science, Yale University, New Haven, CT 06520, USA.
² Department of Computer Science, University of California, Irvine, CA 92617, USA.
³ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.
⁴ Genetics and Genomic Sciences, The Icahn School of Medicine at Mount Sinai, New York, NY 10029-6574, USA.
⁵ NEC Laboratories America, Princeton, NJ 08540, USA.
⁶ Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
⁷ Department of Computer Science, Yale University, New Haven, CT 06520, USA.

Abstract

Motivation: Mapping distal regulatory elements, such as enhancers, is a cornerstone for elucidating how genetic variations may influence diseases. Previous enhancer-prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have implemented enhancer discovery as a binary classification problem without accurate boundary detection, producing low-resolution annotations with superfluous regions and reducing the statistical power for downstream analyses (e.g. causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called Deep-learning framework for Condensing enhancers and refining boundaries with large-scale functional assays (DECODE). First, we employed direct enhancer-activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution, we implemented a weakly supervised object detection framework for enhancer localization with precise boundary detection (to a 10 bp resolution) using Gradient-weighted Class Activation Mapping.

Results: Our DECODE binary classifier outperformed a state-of-the-art enhancer prediction method by 24% in transgenic mouse validation. Furthermore, the object detection framework can condense enhancer annotations to only 13% of their original size, and these compact annotations have significantly higher conservation scores and genome-wide association study variant enrichments than the original predictions. Overall, DECODE is an effective tool for enhancer classification and precise localization.

Availability and implementation: DECODE source code and pre-processing scripts are available at decode.gersteinlab.org.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Animals
Deep Learning*
Enhancer Elements, Genetic* / genetics
Genome-Wide Association Study
Mice
Neural Networks, Computer
Software

Abstract

Publication types

MeSH terms

Grants and funding