Evidence for evolutionary and nonevolutionary forces shaping the distribution of human genetic variants near transcription start sites

PLoS One. 2014 Dec 4;9(12):e114432. doi: 10.1371/journal.pone.0114432. eCollection 2014.

Abstract

The regions surrounding transcription start sites (TSSs) of genes play a critical role in the regulation of gene expression. At the same time, current evidence indicates that these regions are particularly stressed by transcription-related mutagenic phenomena. In this work we performed a genome-wide analysis of the distribution of single nucleotide polymorphisms (SNPs) inside the 10 kb region flanking human TSSs by dividing SNPs into four classes according to their frequency (rare, two intermediate classes, and common). We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS. We found that the distribution of variants is generally different for TSSs located inside or outside of CpG islands. We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores. Furthermore, our analysis suggests that evolutionary (purifying selection) and nonevolutionary (biased gene conversion) forces both play a role in determining the relative SNP frequency around TSSs. Finally, we analyzed the potential pathogenicity of each class of variant using the Combined Annotation Dependent Depletion score. In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • CpG Islands
  • Evolution, Molecular*
  • Gene Frequency
  • Humans
  • Models, Genetic
  • Polymorphism, Single Nucleotide*
  • Promoter Regions, Genetic
  • Sequence Analysis, DNA
  • Transcription Initiation Site*
  • Transcription, Genetic

Grants and funding

GS and OA are recipients of a doctoral fellowship from the Doctorate of Computational Biology and Bioinformatics, University "Federico II", Naples. This work was partially supported by the Epigenomic Flagship Project-Epigen, CNR and by POR Campania FSE 2007–2013, Project CREME. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.