Background: Dissecting genome organization is indispensable for further functional and applied studies. As genome sequences data shown, cotton genomes contain more than 60 % repetitive sequences, so study on repetitive sequences composition, structure, and distribution is the key step to dissect cotton genome.
Results: In this study, a bacterial artificial chromosome (BAC) clone enriched in repetitive sequences, was discovered initiatively by fluorescence in situ hybridization (FISH). FISHing with allotetraploidy cotton as target DNA, dispersed signals on most regions of all A sub-genome chromosomes, and only middle regions of all D sub-genome chromosomes were detected. Further FISHing with other cotton species bearing A or D genome as target DNA, specific signals were viewed. After BAC sequencing and bioinformational analysis, 129 repeat elements, size about 57,172 bp were found, accounting for more than 62 % of the BAC sequence (91,238 bp). Among them, a type of long terminal repeat-retrotransposon (LTR-RT), LTR/Gypsy was the key element causing the specific FISH results. Using the fragments of BAC matching with the identified Gypsy-like LTR as probes, the BAC-57I23-like FISH signals were reappeared. Running BLASTN, the fragments had good match with all chromosomes of G. arboreum (A2) genome and A sub-genome of G. hirsutum (AD1), and had relatively inferior match with all chromosomes of D sub-genome of AD1, but had little match with the chromosomes of G. raimondii (D5) genome, which was consistent with the FISH results.
Conclusion: A repeats-enriched cytogenetic marker to identify A and D sub-genomes of Gossypium was discovered by FISH. Combined sequences analysis with FISH verification, the assembly quality of repetitive sequences in the allotetraploidy cotton draft genome was assessed, and better chromosome belonging was verified. We also found the genomic distribution of the identified Gypsy-LTR-RT was similar to the distribution of heterochromatin. The expansion of this type of Gypsy-LTR-RT in heterochromatic regions may be one of the major reasons for the size gap between A and D genome. The findings showed here will help to understand the composition, structure, and evolution of cotton genome, and contribute to the further perfection of the draft genomes of cotton.
Keywords: BAC; FISH; Gossypium; LTR-RT; Repetitive sequences.