Motif discovery and motif finding from genome-mapped DNase footprint data

Bioinformatics. 2009 Sep 15;25(18):2318-25. doi: 10.1093/bioinformatics/btp434. Epub 2009 Jul 15.

Abstract

Motivation: Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions.

Results: Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for approximately 50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs.

Availability: Software and the collection of regulatory motifs are freely available at http://line.imb.ac.ru/DMMPMM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Chromosome Mapping / methods*
  • Computational Biology / methods*
  • Deoxyribonucleases / chemistry*
  • Drosophila / genetics
  • Molecular Sequence Data
  • Software

Substances

  • Deoxyribonucleases