Toward a high-quality pan-genome landscape of Bacillus subtilis by removal of confounding strains

Brief Bioinform. 2021 Mar 22;22(2):1951-1971. doi: 10.1093/bib/bbaa013.

Abstract

Pan-genome analysis is widely used to study the evolution and genetic diversity of species, particularly in bacteria. However, the impact of strain selection on the outcome of pan-genome analysis is poorly understood. Furthermore, a standard protocol to ensure high-quality pan-genome results is lacking. In this study, we carried out a series of pan-genome analyses of different strain sets of Bacillus subtilis to understand the impact of various strains on the performance and output quality of pan-genome analyses. Consequently, we found that the results obtained by pan-genome analyses of B. subtilis can be influenced by the inclusion of incorrectly classified Bacillus subspecies strains, phylogenetically distinct strains, engineered genome-reduced strains, chimeric strains, strains with a large number of unique genes or a large proportion of pseudogenes, and multiple clonal strains. Since the presence of these confounding strains can seriously affect the quality and true landscape of the pan-genome, we should remove these deviations in the process of pan-genome analyses. Our study provides new insights into the removal of biases from confounding strains in pan-genome analyses at the beginning of data processing, which enables the achievement of a closer representation of a high-quality pan-genome landscape of B. subtilis that better reflects the performance and credibility of the B. subtilis pan-genome. This procedure could be added as an important quality control step in pan-genome analyses for improving the efficiency of analyses, and ultimately contributing to a better understanding of genome function, evolution and genome-reduction strategies for B. subtilis in the future.

Keywords: artificial genome; average nucleotide identity; genome reduction; pan-genome analysis; pan-genome landscape; phylogenetic relationship.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacillus subtilis / classification
  • Bacillus subtilis / genetics*
  • Chromosomes, Bacterial
  • Genetic Variation
  • Genome, Bacterial*
  • Phylogeny
  • Pseudogenes