Background: Retrotransposons are an abundant component of eukaryotic genomes. The high quality of the Arabidopsis thaliana genome sequence makes it possible to comprehensively characterize retroelement populations and explore factors that contribute to their genomic distribution.
Results: We identified the full complement of A. thaliana long terminal repeat (LTR) retroelements using RetroMap, a software tool that iteratively searches genome sequences for reverse transcriptases and then defines retroelement insertions. Relative ages of full-length elements were estimated by assessing sequence divergence between LTRs: the Pseudoviridae were significantly younger than the Metaviridae. All retroelement insertions were mapped onto the genome sequence and their distribution was distinctly non-uniform. Although both Pseudoviridae and Metaviridae tend to cluster within pericentromeric heterochromatin, this association is significantly more pronounced for all three Metaviridae sublineages (Metavirus, Tat and Athila). Among these, Tat and Athila are strictly associated with pericentromeric heterochromatin.
Conclusions: The non-uniform genomic distribution of the Pseudoviridae and the Metaviridae can be explained by a variety of factors including target-site bias, selection against integration into euchromatin and pericentromeric accumulation of elements as a result of suppression of recombination. However, comparisons based on the age of elements and their chromosomal location indicate that integration-site specificity is likely to be the primary factor determining distribution of the Athila and Tat sublineages of the Metaviridae. We predict that, like retroelements in yeast, the Athila and Tat elements target integration to pericentromeric regions by recognizing a specific feature of pericentromeric heterochromatin.