Two high-quality de novo genomes from single ethanol-preserved specimens of tiny metazoans (Collembola)

Gigascience. 2021 May 21;10(5):giab035. doi: 10.1093/gigascience/giab035.

Abstract

Background: Genome sequencing of all known eukaryotes on Earth promises unprecedented advances in biological sciences and in biodiversity-related applied fields such as environmental management and natural product research. Advances in long-read DNA sequencing make it feasible to generate high-quality genomes for many non-genetic model species. However, long-read sequencing today relies on sizable quantities of high-quality, high molecular weight DNA, which is mostly obtained from fresh tissues. This is a challenge for biodiversity genomics of most metazoan species, which are tiny and need to be preserved immediately after collection. Here we present de novo genomes of 2 species of submillimeter Collembola. For each, we prepared the sequencing library from high molecular weight DNA extracted from a single specimen and using a novel ultra-low input protocol from Pacific Biosciences. This protocol requires a DNA input of only 5 ng, permitted by a whole-genome amplification step.

Results: The 2 assembled genomes have N50 values >5.5 and 8.5 Mb, respectively, and both contain ∼96% of BUSCO genes. Thus, they are highly contiguous and complete. The genomes are supported by an integrative taxonomy approach including placement in a genome-based phylogeny of Collembola and designation of a neotype for 1 of the species. Higher heterozygosity values are recorded in the more mobile species. Both species are devoid of the biosynthetic pathway for β-lactam antibiotics known in several Collembola, confirming the tight correlation of antibiotic synthesis with the species way of life.

Conclusions: It is now possible to generate high-quality genomes from single specimens of minute, field-preserved metazoans, exceeding the minimum contig N50 (1 Mb) required by the Earth BioGenome Project.

Keywords: PacBio; eukaryote biodiversity; integrative taxonomy; long-read genome sequencing; low-input DNA; soil invertebrates.

MeSH terms

  • Animals
  • Arthropods* / genetics
  • Ethanol*
  • Genome
  • Genomics
  • High-Throughput Nucleotide Sequencing
  • Sequence Analysis, DNA

Substances

  • Ethanol