HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes

Proc Natl Acad Sci U S A. 2014 Jul 15;111(28):10263-8. doi: 10.1073/pnas.1410068111. Epub 2014 Jun 30.

Abstract

Transposons make up the bulk of eukaryotic genomes, but are difficult to annotate because they evolve rapidly. Most of the unannotated portion of sequenced genomes is probably made up of various divergent transposons that have yet to be categorized. Helitrons are unusual rolling circle eukaryotic transposons that often capture gene sequences, making them of considerable evolutionary importance. Unlike other DNA transposons, Helitrons do not end in inverted repeats or create target site duplications, so they are particularly challenging to identify. Here we present HelitronScanner, a two-layered local combinational variable (LCV) tool for generalized Helitron identification that represents a major improvement over previous identification programs based on DNA sequence or structure. HelitronScanner identified 64,654 Helitrons from a wide range of plant genomes in a highly automated way. We tested HelitronScanner's predictive ability in maize, a species with highly heterogeneous Helitron elements. LCV scores for the 5' and 3' termini of the predicted Helitrons provide a primary confidence level and element copy number provides a secondary one. Newly identified Helitrons were validated by PCR assays or by in silico comparative analysis of insertion site polymorphism among multiple accessions. Many new Helitrons were identified in model species, such as maize, rice, and Arabidopsis, and in a variety of organisms where Helitrons had not been reported previously to our knowledge, leading to a major upward reassessment of their abundance in plant genomes. HelitronScanner promises to be a valuable tool in future comparative and evolutionary studies of this major transposon superfamily.

Keywords: algorithm; bioinformatic analysis; computational tool; transposition.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • DNA Transposable Elements / physiology*
  • Evolution, Molecular*
  • Genome, Plant / physiology*
  • Plants / genetics*
  • Polymerase Chain Reaction / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA Transposable Elements