Patterns of tandem repetition in plant whole genome assemblies

Mol Genet Genomics. 2009 Jun;281(6):579-90. doi: 10.1007/s00438-009-0433-y. Epub 2009 Feb 26.

Abstract

Tandem repeats often confound large genome assemblies. A survey of tandemly arrayed repetitive sequences was carried out in whole genome sequences of the green alga Chlamydomonas reinhardtii, the moss Physcomitrella patens, the monocots rice and sorghum, and the dicots Arabidopsis thaliana, poplar, grapevine, and papaya, in order to test how these assemblies deal with this fraction of DNA. Our results suggest that plant genome assemblies preferentially include tandem repeats composed of shorter monomeric units (especially dinucleotide and 9-30-bp repeats), while higher repetitive units pose more difficulties to assemble. Nevertheless, notwithstanding that currently available sequencing technologies struggle with higher arrays of repeated DNA, major well-known repetitive elements including centromeric and telomeric repeats as well as high copy-number genes, were found to be reasonably well represented. A database including all tandem repeat sequences characterized here was created to benefit future comparative genomic analyses.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arabidopsis / genetics
  • Bryopsida / genetics
  • Carica / genetics
  • Chlamydomonas reinhardtii / genetics
  • DNA, Plant / genetics*
  • Genes, Plant
  • Genetic Markers
  • Genome, Plant*
  • Oryza / genetics
  • Populus / genetics
  • Repetitive Sequences, Nucleic Acid
  • Sorghum / genetics
  • Tandem Repeat Sequences*
  • Vitis / genetics

Substances

  • DNA, Plant
  • Genetic Markers