Tracing the evolution of lineage-specific transcription factor binding sites in a birth-death framework

PLoS Comput Biol. 2014 Aug 21;10(8):e1003771. doi: 10.1371/journal.pcbi.1003771. eCollection 2014 Aug.

Abstract

Changes in cis-regulatory element composition that result in novel patterns of gene expression are thought to be a major contributor to the evolution of lineage-specific traits. Although transcription factor binding events show substantial variation across species, most computational approaches to study regulatory elements focus primarily upon highly conserved sites, and rely heavily upon multiple sequence alignments. However, sequence conservation based approaches have limited ability to detect lineage-specific elements that could contribute to species-specific traits. In this paper, we describe a novel framework that utilizes a birth-death model to trace the evolution of lineage-specific binding sites without relying on detailed base-by-base cross-species alignments. Our model was applied to analyze the evolution of binding sites based on the ChIP-seq data for six transcription factors (GATA1, SOX2, CTCF, MYC, MAX, ETS1) along the lineage toward human after human-mouse common ancestor. We estimate that a substantial fraction of binding sites (∼58-79% for each factor) in humans have origins since the divergence with mouse. Over 15% of all binding sites are unique to hominids. Such elements are often enriched near genes associated with specific pathways, and harbor more common SNPs than older binding sites in the human genome. These results support the ability of our method to identify lineage-specific regulatory elements and help understand their roles in shaping variation in gene regulation across species.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • Binding Sites / genetics*
  • Computational Biology
  • Evolution, Molecular*
  • Humans
  • Mice
  • Models, Genetic*
  • Molecular Sequence Data
  • Polymorphism, Single Nucleotide
  • Primates
  • Sequence Alignment
  • Species Specificity
  • Transcription Factors / genetics*

Substances

  • Transcription Factors