CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products

Interdiscip Sci. 2024 Sep 30. doi: 10.1007/s12539-024-00656-5. Online ahead of print.

Abstract

The sluggish pace of new antibacterial drug development reflects a vulnerability in the face of the current severe threat posed by bacterial resistance. Microbial natural products (NPs), as a reservoir of immense chemical potential, have emerged as the most promising avenue for the discovery of next generation antibacterial agent. Directly accessing the antibacterial activity of potential products derived from biosynthetic gene clusters (BGCs) would significantly expedite the process. To tackle this issue, we propose a CSEL-BGC framework that integrates machine learning (ML) techniques. This framework involves the development of a novel cascade-stacking ensemble learning (CSEL) model and the establishment of a groundbreaking model evaluation system. Based on this framework, we predict 6,666 BGCs with antibacterial activity from 3,468 complete bacterial genomes and elucidate a biosynthetic evolutionary landscape to reveal their antibacterial potential. This provides crucial insights for interpretating the synthesis and secretion mechanisms of unknown NPs.

Keywords: Antibacterial natural products; Machine learning; Model evaluation; Phylogeny.