Evolution of Conserved Noncoding Sequences in Arabidopsis thaliana

Mol Biol Evol. 2021 Jun 25;38(7):2692-2703. doi: 10.1093/molbev/msab042.

Abstract

Recent pangenome studies have revealed a large fraction of the gene content within a species exhibits presence-absence variation (PAV). However, coding regions alone provide an incomplete assessment of functional genomic sequence variation at the species level. Little to no attention has been paid to noncoding regulatory regions in pangenome studies, though these sequences directly modulate gene expression and phenotype. To uncover regulatory genetic variation, we generated chromosome-scale genome assemblies for thirty Arabidopsis thaliana accessions from multiple distinct habitats and characterized species level variation in Conserved Noncoding Sequences (CNS). Our analyses uncovered not only PAV and positional variation (PosV) but that diversity in CNS is nonrandom, with variants shared across different accessions. Using evolutionary analyses and chromatin accessibility data, we provide further evidence supporting roles for conserved and variable CNS in gene regulation. Additionally, our data suggests that transposable elements contribute to CNS variation. Characterizing species-level diversity in all functional genomic sequences may later uncover previously unknown mechanistic links between genotype and phenotype.

Keywords: Molecular evolution; conserved noncoding sequence; intraspecific genomics.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Arabidopsis / genetics*
  • Conserved Sequence*
  • Evolution, Molecular*
  • Gene Duplication
  • Genetic Variation*
  • Genome, Plant
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Selection, Genetic

Associated data

  • Dryad/10.5061/dryad.pzgmsbcfv