A biochemically-interpretable machine learning classifier for microbial GWAS

Nat Commun. 2020 May 22;11(1):2580. doi: 10.1038/s41467-020-16310-9.

Abstract

Current machine learning classifiers have successfully been applied to whole-genome sequencing data to identify genetic determinants of antimicrobial resistance (AMR), but they lack causal interpretation. Here we present a metabolic model-based machine learning classifier, named Metabolic Allele Classifier (MAC), that uses flux balance analysis to estimate the biochemical effects of alleles. We apply the MAC to a dataset of 1595 drug-tested Mycobacterium tuberculosis strains and show that MACs predict AMR phenotypes with accuracy on par with mechanism-agnostic machine learning models (isoniazid AUC = 0.93) while enabling a biochemical interpretation of the genotype-phenotype map. Interpretation of MACs for three antibiotics (pyrazinamide, para-aminosalicylic acid, and isoniazid) recapitulates known AMR mechanisms and suggest a biochemical basis for how the identified alleles cause AMR. Extending flux balance analysis to identify accurate sequence classifiers thus contributes mechanistic insights to GWAS, a field thus far dominated by mechanism-agnostic results.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aminosalicylic Acid / pharmacology
  • Anti-Bacterial Agents / pharmacology
  • Drug Resistance, Bacterial* / drug effects
  • Drug Resistance, Bacterial* / genetics
  • Genome, Bacterial
  • Genome, Microbial
  • Genome-Wide Association Study*
  • Isoniazid / pharmacology
  • Machine Learning*
  • Mycobacterium tuberculosis / drug effects*
  • Mycobacterium tuberculosis / genetics*
  • Pyrazinamide / pharmacology
  • Reproducibility of Results

Substances

  • Anti-Bacterial Agents
  • Pyrazinamide
  • Aminosalicylic Acid
  • Isoniazid