Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences

Hum Mol Genet. 2005 Sep 1;14(17):2533-46. doi: 10.1093/hmg/ddi257. Epub 2005 Jul 21.

Abstract

The non-coding portion of human genome is punctuated by a large number of multispecies conserved sequence (MCS) elements with largely unknown function. We demonstrate that MCSs are unevenly distributed in human introns with the majority of relatively short introns (< 9 kb long) displaying no or a few MCSs and that MCS density reaching up to 10% of total size in longer introns. After correction for intron length, MCSs were found to be enriched within genes involved in development and transcription, whereas depleted in immune response loci. Moreover, many central nervous system tissues show a preferential expression of MCS-rich genes and MCS enrichment significantly correlates with gene functional complexity in terms of distinct protein domains. Analysis of human-mouse orthologous pairs indicated a significant association between intronic MCS density and conservation of protein sequence, promoter regions and untranslated sequences. Moreover, MCS density correlates with the predicted occurrence of human-mouse conserved alternative splicing events. These observations suggest that evolution acts on human genes as integrated units of coding and regulatory capacity and that functional complexity might represent a major source of negative selection on non-coding sequences. To substantiate our result, we also searched previously experimentally identified intronic regulatory elements and indicate that about half of these sequences map to an MCS; in particular, support to the notion whereby mutations in MCSs can result in human genetic diseases is provided, because three previously identified intronic pathological variations were found to occur within MCSs, and human disease and cancer genes were found significantly enriched in MCSs.

MeSH terms

  • Animals
  • Conserved Sequence
  • Gene Expression Regulation
  • Genome, Human*
  • Humans
  • Introns / genetics*
  • Models, Genetic
  • Selection, Genetic*
  • Species Specificity
  • Transcription, Genetic