Whole-genome automated assembly pipeline for Chlamydia trachomatis strains from reference, in vitro and clinical samples using the integrated CtGAP pipeline

NAR Genom Bioinform. 2025 Jan 7;7(1):lqae187. doi: 10.1093/nargab/lqae187. eCollection 2025 Mar.

Abstract

Whole genome sequencing (WGS) is pivotal for the molecular characterization of Chlamydia trachomatis (Ct)-the leading bacterial cause of sexually transmitted infections and infectious blindness worldwide. Ct WGS can inform epidemiologic, public health and outbreak investigations of these human-restricted pathogens. However, challenges persist in generating high-quality genomes for downstream analyses given its obligate intracellular nature and difficulty with in vitro propagation. No single tool exists for the entirety of Ct genome assembly, necessitating the adaptation of multiple programs with varying success. Compounding this issue is the absence of reliable Ct reference strain genomes. We, therefore, developed CtGAP-Chlamydia trachomatisGenome Assembly Pipeline-as an integrated 'one-stop-shop' pipeline for assembly and characterization of Ct genome sequencing data from various sources including isolates, in vitro samples, clinical swabs and urine. CtGAP, written in Snakemake, enables read quality statistics output, adapter and quality trimming, host read removal, de novo and reference-guided assembly, contig scaffolding, selective ompA, multi-locus-sequence and plasmid typing, phylogenetic tree construction, and recombinant genome identification. Twenty Ct reference genomes were also generated. Successfully validated on a diverse collection of 363 samples containing Ct, CtGAP represents a novel pipeline requiring minimal bioinformatics expertise with easy adaptation for use with other bacterial species.

MeSH terms

  • Chlamydia Infections / diagnosis
  • Chlamydia Infections / microbiology
  • Chlamydia trachomatis* / genetics
  • Chlamydia trachomatis* / isolation & purification
  • Genome, Bacterial* / genetics
  • Humans
  • Phylogeny*
  • Software
  • Whole Genome Sequencing* / methods