Family 1 GT, designated as UGT, is the largest and most functionally important multigene family in the plant kingdom. In this study, we carried out a genome-wide identification, analysis, and comparison of 142, 146, and 196 putative UGTs from Gossypium raimondii, Gossypium arboreum, and Gossypium hirsutum, respectively. All members present the 44 amino-acid conserved consensus sequence termed the plant secondary product glycosyltransferase motif. According to the phylogenetic relationship among the cotton UGT proteins and those from other species, GrUGTs and GaUGTs could be classified into 16 major phylogenetic groups (A-P), whereas GhUGTs are classified into 15 major phylogenetic groups with a lack of group C. All cotton UGTs are dispersed throughout the chromosomes and are displayed in clusters with the same open reading frame orientation. The expansion of them appears to result from genome duplication and rearrangement. Two conserved introns, A and B, are detected in most of the intron-containing-UGTs in G. raimondii and G. arboreum, whereas only intron A is detected in the intron-containing-UGTs in G. hirsutum. Furthermore, expression patterns of the UGT genes in G. hirsutum wild type and its near isogenic fuzzless-lintless mutant at the stage of fiber initiation were analyzed using the RNA-seq data. Overall, this study not only deepens our understanding of the structure, phylogeny, evolution, and expression of cotton UGT genes, but also provides a solid foundation for further cloning and functional studies of the UGT family genes.
Keywords: Cotton; Genome-wide; PSPG; UDP-glycosyltransferase.