GUNC: detection of chimerism and contamination in prokaryotic genomes

Genome Biol. 2021 Jun 13;22(1):178. doi: 10.1186/s13059-021-02393-0.

Abstract

Genomes are critical units in microbiology, yet ascertaining quality in prokaryotic genome assemblies remains a formidable challenge. We present GUNC (the Genome UNClutterer), a tool that accurately detects and quantifies genome chimerism based on the lineage homogeneity of individual contigs using a genome's full complement of genes. GUNC complements existing approaches by targeting previously underdetected types of contamination: we conservatively estimate that 5.7% of genomes in GenBank, 5.2% in RefSeq, and 15-30% of pre-filtered "high-quality" metagenome-assembled genomes in recent studies are undetected chimeras. GUNC provides a fast and robust tool to substantially improve prokaryotic genome quality.

Keywords: Bioinformatics; Genome contamination; Genome quality; Metagenome-assembled genomes; Metagenomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chimerism*
  • Computational Biology / methods*
  • Contig Mapping
  • Genome, Bacterial*
  • Metagenome*
  • Metagenomics / methods
  • Phylogeny
  • Prokaryotic Cells / cytology
  • Prokaryotic Cells / metabolism
  • Proteobacteria / genetics*
  • Software*