Distinguishing regulatory DNA from neutral sites

Laura Elnitski; Ross C Hardison; Jia Li; Shan Yang; Diana Kolbe; Pallavi Eswara; Michael J O'Connor; Scott Schwartz; Webb Miller; Francesca Chiaromonte

doi:10.1101/gr.817703

Distinguishing regulatory DNA from neutral sites

Genome Res. 2003 Jan;13(1):64-72. doi: 10.1101/gr.817703.

Authors

Laura Elnitski¹, Ross C Hardison, Jia Li, Shan Yang, Diana Kolbe, Pallavi Eswara, Michael J O'Connor, Scott Schwartz, Webb Miller, Francesca Chiaromonte

Affiliation

¹ Departments of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.

Abstract

We explore several computational approaches to analyzing interspecies genomic sequence alignments, aiming to distinguish regulatory regions from neutrally evolving DNA. Human-mouse genomic alignments were collected for three sets of human regions: (1) experimentally defined gene regulatory regions, (2) well-characterized exons (coding sequences, as a positive control), and (3) interspersed repeats thought to have inserted before the human-mouse split (a good model for neutrally evolving DNA). Models that potentially could distinguish functional noncoding sequences from neutral DNA were evaluated on these three data sets, as well as bulk genome alignments. Our analyses show that discrimination based on frequencies of individual nucleotide pairs or gaps (i.e., of possible alignment columns) is only partially successful. In contrast, scoring procedures that include the alignment context, based on frequencies of short runs of alignment columns, dramatically improve separation between regulatory and neutral features. Such scoring functions should aid in the identification of putative regulatory regions throughout the human genome.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Animals
Base Composition / genetics
Base Pairing / genetics
Computational Biology / methods
DNA / genetics*
Evolution, Molecular
Exons / genetics
Humans
Interspersed Repetitive Sequences / genetics
Mice
Models, Genetic
Nucleotides / genetics
Regulatory Sequences, Nucleic Acid / genetics*
Sequence Alignment / methods

Substances

Nucleotides
DNA

Abstract

Publication types

MeSH terms

Substances

Grants and funding