A preliminary analysis of genome structure and composition in Gossypium hirsutum

BMC Genomics. 2008 Jul 1:9:314. doi: 10.1186/1471-2164-9-314.

Abstract

Background: Upland cotton has the highest yield, and accounts for > 95% of world cotton production. Decoding upland cotton genomes will undoubtedly provide the ultimate reference and resource for structural, functional, and evolutionary studies of the species. Here, we employed GeneTrek and BAC tagging information approaches to predict the general composition and structure of the allotetraploid cotton genome.

Results: 142 BAC sequences from Gossypium hirsutum cv. Maxxa were downloaded http://www.ncbi.nlm.nih.gov and confirmed. These BAC sequence analysis revealed that the tetraploid cotton genome contains over 70,000 candidate genes with duplicated gene copies in homoeologous A- and D-subgenome regions. Gene distribution is uneven, with gene-rich and gene-free regions of the genome. Twenty-one percent of the 142 BACs lacked genes. BAC gene density ranged from 0 to 33.2 per 100 kb, whereas most gene islands contained only one gene with an average of 1.5 genes per island. Retro-elements were found to be a major component, first an enriched LTR/gypsy and second LTR/copia. Most LTR retrotransposons were truncated and in nested structures. In addition, 166 polymorphic loci amplified with SSRs developed from 70 BAC clones were tagged on our backbone genetic map. Seventy-five percent (125/166) of the polymorphic loci were tagged on the D-subgenome. By comprehensively analyzing the molecular size of amplified products among tetraploid G. hirsutum cv. Maxxa, acc. TM-1, and G. barbadense cv. Hai7124, and diploid G. herbaceum var. africanum and G. raimondii, 37 BACs, 12 from the A- and 25 from the D-subgenome, were further anchored to their corresponding subgenome chromosomes. After a large amount of genes sequence comparison from different subgenome BACs, the result showed that introns might have no contribution to different subgenome size in Gossypium.

Conclusion: This study provides us with the first glimpse of cotton genome complexity and serves as a foundation for tetraploid cotton whole genomesequencing in the future.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosomes, Artificial, Bacterial* / genetics
  • Chromosomes, Plant
  • Evolution, Molecular
  • Genes, Duplicate
  • Genetic Linkage
  • Genetic Markers
  • Genome, Plant*
  • Gossypium / genetics*
  • Microsatellite Repeats
  • Models, Genetic
  • Physical Chromosome Mapping
  • Polymorphism, Genetic
  • Polyploidy*
  • Random Amplified Polymorphic DNA Technique
  • Retroelements
  • Sequence Analysis, DNA*
  • Software

Substances

  • Genetic Markers
  • Retroelements