Pangenome mining of the Streptomyces genus redefines species' biosynthetic potential

Omkar S Mohite; Tue S Jørgensen; Thomas J Booth; Pep Charusanti; Patrick V Phaneuf; Tilmann Weber; Bernhard O Palsson

doi:10.1186/s13059-024-03471-9

Pangenome mining of the Streptomyces genus redefines species' biosynthetic potential

Genome Biol. 2025 Jan 14;26(1):9. doi: 10.1186/s13059-024-03471-9.

Authors

Omkar S Mohite¹, Tue S Jørgensen¹, Thomas J Booth¹, Pep Charusanti¹, Patrick V Phaneuf¹, Tilmann Weber², Bernhard O Palsson^{3

4

5

6}

Affiliations

¹ The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, 2800, Denmark.
² The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, 2800, Denmark. [email protected].
³ The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, 2800, Denmark. [email protected].
⁴ Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA. [email protected].
⁵ Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA. [email protected].
⁶ Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA. [email protected].

PMID: 39810189
DOI: 10.1186/s13059-024-03471-9

Abstract

Background: Streptomyces is a highly diverse genus known for the production of secondary or specialized metabolites with a wide range of applications in the medical and agricultural industries. Several thousand complete or nearly complete Streptomyces genome sequences are now available, affording the opportunity to deeply investigate the biosynthetic potential within these organisms and to advance natural product discovery initiatives.

Results: We perform pangenome analysis on 2371 Streptomyces genomes, including approximately 1200 complete assemblies. Employing a data-driven approach based on genome similarities, the Streptomyces genus was classified into 7 primary and 42 secondary Mash-clusters, forming the basis for comprehensive pangenome mining. A refined workflow for grouping biosynthetic gene clusters (BGCs) redefines their diversity across different Mash-clusters. This workflow also reassigns 2729 known BGC families to only 440 families, a reduction caused by inaccuracies in BGC boundary detections. When the genomic location of BGCs is included in the analysis, a conserved genomic structure, or synteny, among BGCs becomes apparent within species and Mash-clusters. This synteny suggests that vertical inheritance is a major factor in the diversification of BGCs.

Conclusions: Our analysis of a genomic dataset at a scale of thousands of genomes refines predictions of BGC diversity using Mash-clusters as a basis for pangenome analysis. The observed conservation in the order of BGCs' genomic locations shows that the BGCs are vertically inherited. The presented workflow and the in-depth analysis pave the way for large-scale pangenome investigations and enhance our understanding of the biosynthetic potential of the Streptomyces genus.

Keywords: Streptomyces; Biosynthetic Gene Clusters; Genome mining; Metabolism; Pangenome analysis; Phylogenetic analysis.

MeSH terms

Biosynthetic Pathways / genetics
Genome, Bacterial*
Genomics
Multigene Family*
Phylogeny
Streptomyces* / genetics
Streptomyces* / metabolism
Synteny