Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana

Bioinformatics. 2004 May 1;20(7):1081-6. doi: 10.1093/bioinformatics/bth043. Epub 2004 Feb 5.

Abstract

Motivation: Simple sequence repeats or microsatellites have been found abundantly in many genomes. However, the significance of distribution preference has not been completely understood. Completion of the Arabidopsis genome sequencing allows us to better understand and characterize microsatellites.

Results: Microsatellite distribution was more abundant in 5'-flanking regions of genes compared with that expected in the whole genome, with an over-representation of AG and AAG repeats; there were clear differences from distributions in 3'-flanks and coding fractions, where triplet frequencies evidently corresponded to codon usage. We identified 1140 full-length genes that contained at least one locus of AG or AAG repeats in their upstream sequences, and whose functional characteristics were significantly associated with the repeats. This observation indicates that selective pressure markedly differed in the three transcribed regions, with positive selection of AG and AAG repeats in 5'-flanks close to those genes whose products are preferentially involved in transcription.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Arabidopsis / genetics*
  • Base Sequence
  • DNA, Plant / genetics*
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Plant / genetics
  • Genome, Plant
  • Microsatellite Repeats / genetics
  • Molecular Sequence Data
  • Open Reading Frames
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*

Substances

  • DNA, Plant