A data integration framework for prediction of transcription factor targets

Ann N Y Acad Sci. 2009 Mar:1158:205-14. doi: 10.1111/j.1749-6632.2008.03758.x.

Abstract

We present a computational framework for predicting targets of transcription factor regulation. The framework is based on the integration of a number of sources of evidence, derived from DNA-sequence and gene-expression data, using a weighted sum approach. Sources of evidence are prioritized based on a training set, and their relative contributions are then optimized. The performance of the proposed framework is demonstrated in the context of BCL6 target prediction. We show that this framework is able to uncover BCL6 targets reliably when biological prior information is utilized effectively, particularly in the case of sequence analysis. The framework results in a considerable gain in performance over scores in which sequence information was not incorporated. This analysis shows that with assessment of the quality and biological relevance of the data, reliable predictions can be obtained with this computational framework.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Area Under Curve
  • Artificial Intelligence
  • Base Sequence
  • Binding Sites
  • Gene Expression Profiling
  • Gene Expression Regulation*
  • Humans
  • Oligonucleotide Array Sequence Analysis
  • Proto-Oncogene Proteins / genetics
  • Proto-Oncogene Proteins / metabolism*
  • ROC Curve
  • Repressor Proteins / genetics
  • Repressor Proteins / metabolism*
  • Software
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*

Substances

  • BCOR protein, human
  • Proto-Oncogene Proteins
  • Repressor Proteins
  • Transcription Factors