Annotation of cis-regulatory elements by identification, subclassification, and functional assessment of multispecies conserved sequences

Proc Natl Acad Sci U S A. 2005 Jul 12;102(28):9830-5. doi: 10.1073/pnas.0503401102. Epub 2005 Jul 5.

Abstract

An important step toward improving the annotation of the human genome is to identify cis-acting regulatory elements from primary DNA sequence. One approach is to compare sequences from multiple, divergent species. This approach distinguishes multispecies conserved sequences (MCS) in noncoding regions from more rapidly evolving neutral DNA. Here, we have analyzed a region of approximately 238kb containing the human alpha globin cluster that was sequenced and/or annotated across the syntenic region in 22 species spanning 500 million years of evolution. Using a variety of bioinformatic approaches and correlating the results with many aspects of chromosome structure and function in this region, we were able to identify and evaluate the importance of 24 individual MCSs. This approach sensitively and accurately identified previously characterized regulatory elements but also discovered unidentified promoters, exons, splicing, and transcriptional regulatory elements. Together, these studies demonstrate an integrated approach by which to identify, subclassify, and predict the potential importance of MCSs.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • Computational Biology / methods
  • Conserved Sequence / genetics*
  • Gene Components / genetics*
  • Genome, Human*
  • Genomics / methods
  • Globins / genetics*
  • Humans
  • Molecular Sequence Data
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Species Specificity

Substances

  • Globins

Associated data

  • GENBANK/AC120212
  • GENBANK/AC120213
  • GENBANK/AC120214
  • GENBANK/AC120215
  • GENBANK/AC120504
  • GENBANK/AC121214
  • GENBANK/AC139599
  • GENBANK/AC145461
  • GENBANK/AC145463
  • GENBANK/AC145465
  • GENBANK/AC145483
  • GENBANK/AC146463
  • GENBANK/AC146591
  • GENBANK/AC146782
  • GENBANK/AC148220
  • GENBANK/AC148752
  • GENBANK/AC150435
  • GENBANK/AC151883