TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization

Front Public Health. 2023 Aug 1:11:1198213. doi: 10.3389/fpubh.2023.1198213. eCollection 2023.

Abstract

Introduction: The clinical incidence of antimicrobial-resistant fungal infections has dramatically increased in recent years. Certain fungal pathogens colonize various body cavities, leading to life-threatening bloodstream infections. However, the identification and characterization of fungal isolates in laboratories remain a significant diagnostic challenge in medicine and public health. Whole-genome sequencing provides an unbiased and uniform identification pipeline for fungal pathogens but most bioinformatic analysis pipelines focus on prokaryotic species. To this end, TheiaEuk_Illumina_PE_PHB (TheiaEuk) was designed to focus on genomic analysis specialized to fungal pathogens.

Methods: TheiaEuk was designed using containerized components and written in the workflow description language (WDL) to facilitate deployment on the cloud-based open bioinformatics platform Terra. This species-agnostic workflow enables the analysis of fungal genomes without requiring coding, thereby reducing the entry barrier for laboratory scientists. To demonstrate the usefulness of this pipeline, an ongoing outbreak of C. auris in southern Nevada was investigated. We performed whole-genome sequence analysis of 752 new C. auris isolates from this outbreak. Furthermore, TheiaEuk was utilized to observe the accumulation of mutations in the FKS1 gene over the course of the outbreak, highlighting the utility of TheiaEuk as a monitor of emerging public health threats when combined with whole-genome sequencing surveillance of fungal pathogens.

Results: A primary result of this work is a curated fungal database containing 5,667 unique genomes representing 245 species. TheiaEuk also incorporates taxon-specific submodules for specific species, including clade-typing for Candida auris (C. auris). In addition, for several fungal species, it performs dynamic reference genome selection and variant calling, reporting mutations found in genes currently associated with antifungal resistance (FKS1, ERG11, FUR1). Using genome assemblies from the ATCC Mycology collection, the taxonomic identification module used by TheiaEuk correctly assigned genomes to the species level in 126/135 (93.3%) instances and to the genus level in 131/135 (97%) of instances, and provided zero false calls. Application of TheiaEuk to actual specimens obtained in the course of work at a local public health laboratory resulted in 13/15 (86.7%) correct calls at the species level, with 2/15 called at the genus level. It made zero incorrect calls. TheiaEuk accurately assessed clade type of Candida auris in 297/302 (98.3%) of instances.

Discussion: TheiaEuk demonstrated effectiveness in identifying fungal species from whole genome sequence. It further showed accuracy in both clade-typing of C. auris and in the identification of mutations known to associate with drug resistance in that organism.

Keywords: Candida auris; bioinformatics; emerging pathogens; epidemiology; whole-genome sequencing.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Computational Biology*
  • Disease Outbreaks
  • Genome, Fungal*
  • Genomics
  • Workflow