Genome sequence of the cultivated cotton Gossypium arboreum

Nat Genet. 2014 Jun;46(6):567-72. doi: 10.1038/ng.2987. Epub 2014 May 18.

Abstract

The complex allotetraploid nature of the cotton genome (AADD; 2n = 52) makes genetic, genomic and functional analyses extremely challenging. Here we sequenced and assembled the Gossypium arboreum (AA; 2n = 26) genome, a putative contributor of the A subgenome. A total of 193.6 Gb of clean sequence covering the genome by 112.6-fold was obtained by paired-end sequencing. We further anchored and oriented 90.4% of the assembly on 13 pseudochromosomes and found that 68.5% of the genome is occupied by repetitive DNA sequences. We predicted 41,330 protein-coding genes in G. arboreum. Two whole-genome duplications were shared by G. arboreum and Gossypium raimondii before speciation. Insertions of long terminal repeats in the past 5 million years are responsible for the twofold difference in the sizes of these genomes. Comparative transcriptome studies showed the key role of the nucleotide binding site (NBS)-encoding gene family in resistance to Verticillium dahliae and the involvement of ethylene in the development of cotton fiber cells.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Chromosome Mapping / methods
  • DNA, Plant
  • Disease Resistance / genetics
  • Ethylenes / chemistry
  • Evolution, Molecular
  • Gene Library
  • Genome, Plant*
  • Gossypium / genetics*
  • Models, Genetic
  • Phylogeny
  • Plant Diseases / genetics
  • Plant Diseases / prevention & control
  • Polyploidy
  • Retroelements
  • Sequence Analysis, DNA
  • Species Specificity
  • Terminal Repeat Sequences
  • Transcriptome
  • Verticillium

Substances

  • DNA, Plant
  • Ethylenes
  • Retroelements
  • ethylene