A near-complete genome assembly of Thalia dealbata Fraser (Marantaceae)

Front Plant Sci. 2023 Jun 13:14:1183361. doi: 10.3389/fpls.2023.1183361. eCollection 2023.

Abstract

This study presents a chromosome-level, near-complete genome assembly of Thalia dealbata (Marantaceae), a typical emergent wetland plant with high ornamental and environmental value. Based on 36.99 Gb PacBio HiFi reads and 39.44 Gb Hi-C reads, we obtained a 255.05 Mb assembly, of which 251.92 Mb (98.77%) were anchored into eight pseudo-chromosomes. Five pseudo-chromosomes were completely assembled, and the other three had one to two gaps. The final assembly had a high contig N50 value (29.80 Mb) and benchmarking universal single-copy orthologs (BUSCO) recovery score (97.52%). The T. dealbata genome had 100.35 Mb repeat sequences, 24,780 protein-coding genes, and 13,679 non-coding RNAs. Phylogenetic analysis revealed that T. dealbata was closest to Zingiber officinale, whose divergence time was approximately 55.41 million years ago. In addition, 48 and 52 significantly expanded and contracted gene families were identified within the T. dealbata genome. Moreover, 309 gene families were specific to T. dealbata, and 1,017 genes were positively selected. The T. dealbata genome reported in this study provides a valuable genomic resource for further research on wetland plant adaptation and the genome evolution dynamics. This genome is also beneficial for the comparative genomics of Zingiberales species and flowering plants.

Keywords: PacBio HiFi; Thalia dealbata; genome annotation; near-complete genome assembly; wetland plant.

Grants and funding

The authors declare that this study received funding from Kunming Novo Medical Laboratory Co., Ltd. The funder had the following involvement in the study: designed and supervised the study, technical guidance, genome sequencing and assembly, analysis, interpretation of data and the writing of this article. All authors agreed to submit it for publication.