LocoGSE, a sequence-based genome size estimator for plants

Front Plant Sci. 2024 Mar 14:15:1328966. doi: 10.3389/fpls.2024.1328966. eCollection 2024.

Abstract

Extensive research has focused on exploring the range of genome sizes in eukaryotes, with a particular emphasis on land plants, where significant variability has been observed. Accurate estimation of genome size is essential for various research purposes, but existing sequence-based methods have limitations, particularly for low-coverage datasets. In this study, we introduce LocoGSE, a novel genome size estimator designed specifically for low-coverage datasets generated by genome skimming approaches. LocoGSE relies on mapping the reads on single copy consensus proteins without the need for a reference genome assembly. We calibrated LocoGSE using 430 low-coverage Angiosperm genome skimming datasets and compared its performance against other estimators. Our results demonstrate that LocoGSE accurately predicts monoploid genome size even at very low depth of coverage (<1X) and on highly heterozygous samples. Additionally, LocoGSE provides stable estimates across individuals with varying ploidy levels. LocoGSE fills a gap in sequence-based plant genome size estimation by offering a user-friendly and reliable tool that does not rely on high coverage or reference assemblies. We anticipate that LocoGSE will facilitate plant genome size analysis and contribute to evolutionary and ecological studies in the field. Furthermore, at the cost of an initial calibration, LocoGSE can be used in other lineages.

Keywords: 1C; 1Cx; environmental DNA; genome size; genome size estimation; genome-skimming; plant genomics; ploidy.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the Genoscope and the Commissariat à l’Énergie Atomique et aux Énergies Alternatives (CEA). Sequencing was performed within the framework of the PhyloAlps project, funded by France Génomique (ANR-10-INBS-09-08) and the PhyloNorway project funded by the Research Council of Norway (226134/F50) and the Norwegian Biodiversity Information Centre (14-14, 70184209).