Reduced purifying selection prevails over positive selection in human copy number variant evolution

Genome Res. 2008 Nov;18(11):1711-23. doi: 10.1101/gr.077289.108. Epub 2008 Aug 7.

Abstract

Copy number variation is a dominant contributor to genomic variation and may frequently underlie an individual's variable susceptibilities to disease. Here we question our previous proposition that copy number variants (CNVs) are often retained in the human population because of their adaptive benefit. We show that genic biases of CNVs are best explained, not by positive selection, but by reduced efficiency of selection in eliminating deleterious changes from the human population. Of four CNV data sets examined, three exhibit significant increases in protein evolutionary rates. These increases appear to be attributable to the frequent coincidence of CNVs with segmental duplications (SDs) that recombine infrequently. Furthermore, human orthologs of mouse genes, which, when disrupted, result in pre- or postnatal lethality, are unusually depleted in CNVs. Together, these findings support a model of reduced purifying selection (Hill-Robertson interference) within copy number variable regions that are enriched in nonessential genes, allowing both the fixation of slightly deleterious substitutions and increased drift of CNV alleles. Additionally, all four CNV sets exhibited increased rates of interspecies chromosomal rearrangement and nucleotide substitution and an increased gene density. We observe that sequences with high G+C contents are most prone to copy number variation. In particular, frequently duplicated human SD sequence, or CNVs that are large and/or observed frequently, tend to be elevated in G+C content. In contrast, SD sequences that appear fixed in the human population lie more frequently within low G+C sequence. These findings provide an overarching view of how CNVs arise and segregate in the human population.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Biological Evolution
  • Chromosomes, Artificial, Bacterial / genetics
  • Databases, Genetic
  • GC Rich Sequence
  • Gene Dosage*
  • Genetic Variation
  • Genome, Human
  • Humans
  • Mice
  • Models, Genetic
  • Selection, Genetic*
  • Time Factors