MACI: A machine learning-based approach to identify drug classes of antibiotic resistance genes from metagenomic data

Comput Biol Med. 2023 Oct 24:167:107629. doi: 10.1016/j.compbiomed.2023.107629. Online ahead of print.

Abstract

Novel methodologies are now essential for identification of antibiotic resistant pathogens in order to resist them. Here, we are presenting a model, MACI (Machine learning-based Antibiotic resistance gene-specific drug Class Identification) that can take metagenomic fragments as input and predict the drug class of antibiotic resistant genes. In our study, we trained a model using the Comprehensive Antibiotic Resistance Database, containing 5138 representative sequences across 134 drug classes. Among these classes, 23 dominated, contributing 85% of the sequence data. The model achieved an average precision of 0.8389 ± 0.0747 and recall of 0.8197 ± 0.0782 for these 23 drug classes. Additionally, it exhibited higher performance (precision and recall: 0.8817 ± 0.0540 and 0.8620 ± 0.0493) for predicting multidrug resistant classes compared to single drug resistant categories (0.7923 ± 0.0669 and 0.7737 ± 0.0794). The model also showed promising results when tested on an independent data. We then analysed these 23 drug classes to identify class-specific overlapping nucleotide patterns. Five significant drug classes, viz. "Carbapenem; cephalosporin; penam", "cephalosporin", "cephamycin", "cephalosporin; monobactam; penam; penem", and "fluoroquinolone" were identified, and their patterns aligned with the functional domains of antibiotic resistance genes. These class-specific patterns play a pivotal role in rapidly identifying drug classes with antibiotic resistance genes. Further analysis revealed that bacterial species containing these five drug classes are associated with well-known multidrug resistance properties.

Keywords: Antibiotic resistance gene; Drug class; Gene sequencing; Machine learning; Metagenomic reads; Taxonomic clades.