GC Content Across Insect Genomes: Phylogenetic Patterns, Causes and Consequences

J Mol Evol. 2024 Apr;92(2):138-152. doi: 10.1007/s00239-024-10160-5. Epub 2024 Mar 15.

Abstract

The proportions of A:T and G:C nucleotide pairs are often unequal and can vary greatly between animal species and along chromosomes. The causes and consequences of this variation are incompletely understood. The recent release of high-quality genome sequences from the Darwin Tree of Life and other large-scale genome projects provides an opportunity for GC heterogeneity to be compared across a large number of insect species. Here we analyse GC content along chromosomes, and within protein-coding genes and codons, of 150 insect species from four holometabolous orders: Coleoptera, Diptera, Hymenoptera, and Lepidoptera. We find that protein-coding sequences have higher GC content than the genome average, and that Lepidoptera generally have higher GC content than the other three insect orders examined. GC content is higher in small chromosomes in most Lepidoptera species, but this pattern is less consistent in other orders. GC content also increases towards subtelomeric regions within protein-coding genes in Diptera, Coleoptera and Lepidoptera. Two species of Diptera, Bombylius major and B. discolor, have very atypical genomes with ubiquitous increase in AT content, especially at third codon positions. Despite dramatic AT-biased codon usage, we find no evidence that this has driven divergent protein evolution. We argue that the GC landscape of Lepidoptera, Diptera and Coleoptera genomes is influenced by GC-biased gene conversion, strongest in Lepidoptera, with some outlier taxa affected drastically by counteracting processes.

Keywords: Bee-flies; Biased gene conversion; Genome evolution; Nucleotide composition.

MeSH terms

  • Animals
  • Base Composition
  • Codon / genetics
  • Evolution, Molecular
  • Genome, Insect* / genetics
  • Insecta* / genetics
  • Phylogeny

Substances

  • Codon