BindSpace decodes transcription factor binding signals by large-scale sequence embedding

Nat Methods. 2019 Sep;16(9):858-861. doi: 10.1038/s41592-019-0511-y. Epub 2019 Aug 12.

Abstract

The decoding of transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF labels into the same space. By training on binding data from hundreds of TFs and embedding over 1 M DNA sequences, BindSpace achieves state-of-the-art multiclass binding prediction performance, in vitro and in vivo, and can distinguish between signals of closely related TFs.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Binding Sites
  • Chromatin Immunoprecipitation
  • Computational Biology / methods*
  • DNA / chemistry
  • DNA / metabolism*
  • Humans
  • Machine Learning*
  • Protein Binding
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors
  • DNA