Phenetic Comparison of Prokaryotic Genomes Using k-mers

Mol Biol Evol. 2017 Oct 1;34(10):2716-2729. doi: 10.1093/molbev/msx200.

Abstract

Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets.

Keywords: comparative genomics; horizontal gene transfer; microbial evolution; population structure; software.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / genetics
  • Biological Evolution
  • Cluster Analysis
  • Computational Biology / methods*
  • Computer Simulation
  • Evolution, Molecular
  • Genome, Bacterial / genetics*
  • Genomics / methods
  • Metagenomics
  • Phylogeny
  • Prokaryotic Cells
  • Sequence Analysis, DNA / methods*
  • Software