Metagenomics, particularly genome-resolved metagenomics, has significantly deepened our understanding of microbes, illuminating their taxonomic and functional diversity and roles in ecology, physiology, and evolution. However, eukaryotic populations within various microbiomes, including those in the mammalian gastrointestinal (GI) tract, remain relatively underexplored in metagenomic studies due to the lack of comprehensive reference genome databases and robust bioinformatics tools. The GI tract of ruminants, particularly the rumen, contains a high eukaryotic biomass although a relatively low diversity of ciliates and fungi, which significantly impacts feed digestion, methane emissions, and rumen microbial ecology. In the present study, we developed GutEuk, a bioinformatics tool that improves upon the currently available Tiara and EukRep in accurately identifying eukaryotic sequences from metagenomes. GutEuk is optimized for high precision across different sequence lengths. It can also distinguish fungal and protozoal sequences, further elucidating their unique ecological, physiological, and nutritional impacts. GutEuk was shown to facilitate comprehensive analyses of protozoa and fungi within more than one thousand rumen metagenomes, revealing a greater genomic diversity among protozoa than previously documented. We further curated several ruminant eukaryotic protein databases, significantly enhancing our ability to distinguish the functional roles of ruminant fungi and protozoa from those of prokaryotes. Overall, the newly developed package GutEuk and its associated databases create new opportunities for in-depth study of GI tract eukaryotes.
Published by Cold Spring Harbor Laboratory Press.