BindSpace decodes transcription factor binding signals by large-scale sequence embedding

Han Yuan; Meghana Kshirsagar; Lee Zamparo; Yuheng Lu; Christina S Leslie

doi:10.1038/s41592-019-0511-y

BindSpace decodes transcription factor binding signals by large-scale sequence embedding

Nat Methods. 2019 Sep;16(9):858-861. doi: 10.1038/s41592-019-0511-y. Epub 2019 Aug 12.

Authors

Han Yuan^{1

2}, Meghana Kshirsagar¹, Lee Zamparo¹, Yuheng Lu¹, Christina S Leslie³

Affiliations

¹ Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
² Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA.
³ Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. [email protected].

Abstract

The decoding of transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF labels into the same space. By training on binding data from hundreds of TFs and embedding over 1 M DNA sequences, BindSpace achieves state-of-the-art multiclass binding prediction performance, in vitro and in vivo, and can distinguish between signals of closely related TFs.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Binding Sites
Chromatin Immunoprecipitation
Computational Biology / methods*
DNA / chemistry
DNA / metabolism*
Humans
Machine Learning*
Protein Binding
Transcription Factors / metabolism*

Substances

Transcription Factors
DNA

Abstract

Publication types

MeSH terms

Substances

Grants and funding