Predicting the in vivo signature of human gene regulatory sequences

Bioinformatics. 2005 Jun:21 Suppl 1:i338-43. doi: 10.1093/bioinformatics/bti1047.

Abstract

Motivation: In the living cell nucleus, genomic DNA is packaged into chromatin. DNA sequences that regulate transcription and other chromosomal processes are associated with local disruptions, or 'openings', in chromatin structure caused by the cooperative action of regulatory proteins. Such perturbations are extremely specific for cis-regulatory elements and occur over short stretches of DNA (typically approximately 250 bp). They can be detected experimentally as DNaseI hypersensitive sites (HSs) in vivo, though the process is extremely laborious and costly. The ability to discriminate DNaseI HSs computationally would have a major impact on the annotation and utilization of the human genome.

Results: We found that a supervised pattern recognition algorithm, trained using a set of 280 DNaseI HS and 737 non-HS control sequences from erythroid cells, was capable of de novo prediction of HSs across the human genome with surprisingly high accuracy determined by prospective in vivo validation. Systematic application of this computational approach will greatly facilitate the discovery and analysis of functional non-coding elements in the human and other complex genomes.

Availability: Supplementary data is available at noble.gs.washington.edu/proj/hs

MeSH terms

  • Algorithms
  • Binding Sites
  • Chromatin / metabolism
  • Computational Biology / methods*
  • DNA / chemistry
  • Deoxyribonuclease I / metabolism
  • Erythrocytes / metabolism
  • False Positive Reactions
  • Genome
  • Genome, Human
  • Humans
  • Probability
  • ROC Curve
  • Reproducibility of Results
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Transcription, Genetic

Substances

  • Chromatin
  • DNA
  • Deoxyribonuclease I