Genome-based analysis allows for large-scale classification of diverse bacteria and has been widely adopted for delineating species. Unfortunately, for higher taxonomic ranks such as genus, establishing a generally accepted approach based on genome analysis is challenging. While core-genome phylogenies depict the evolutionary relationships among species, determining the correspondence between clades and genera may not be straightforward. For genotypic divergence, the percentage of conserved proteins and genome-wide average amino acid identity are commonly used, but often do not provide a clear threshold for classification. In this work, we investigated the utility of global comparisons and data visualization in identifying clusters of species based on their overall gene content and rationalized that such patterns can be integrated with phylogeny and other information such as phenotypes for improving taxonomy. As a proof of concept, we selected 177 representative genome sequences from the Mycoplasmatales-Entomoplasmatales clade within the class Mollicutes for a case study. We found that the clustering patterns corresponded to the current understanding of these organisms, namely the split into three above-genus groups: Hominis, Pneumoniae and Spiroplasma-Entomoplasmataceae-Mycoides. However, at the genus level, several important issues were found. For example, recent taxonomic revisions that split the Hominis group into three genera and Entomoplasmataceae into five genera are problematic, as those newly described or emended genera lack clear differentiations in gene content from one another. Moreover, several cases of misclassification were identified. These findings demonstrated the utility of this approach and its potential application to other bacteria.
Keywords: Mollicutes; Mycoplasma; gene content; genome; genus; taxonomy.