Combined network analysis and machine learning allows the prediction of metabolic pathways from tomato metabolomics data

Commun Biol. 2019 Jun 18:2:214. doi: 10.1038/s42003-019-0440-4. eCollection 2019.

Abstract

The identification and understanding of metabolic pathways is a key aspect in crop improvement and drug design. The common approach for their detection is based on gene annotation and ontology. Correlation-based network analysis, where metabolites are arranged into network formation, is used as a complentary tool. Here, we demonstrate the detection of metabolic pathways based on correlation-based network analysis combined with machine-learning techniques. Metabolites of known tomato pathways, non-tomato pathways, and random sets of metabolites were mapped as subgraphs onto metabolite correlation networks of the tomato pericarp. Network features were computed for each subgraph, generating a machine-learning model. The model predicted the presence of the β-alanine-degradation-I, tryptophan-degradation-VII-via-indole-3-pyruvate (yet unknown to plants), the β-alanine-biosynthesis-III, and the melibiose-degradation pathway, although melibiose was not part of the networks. In vivo assays validated the presence of the melibiose-degradation pathway. For the remaining pathways only some of the genes encoding regulatory enzymes were detected.

Keywords: Computational models; Machine learning; Metabolomics; Network topology; Plant biotechnology.

MeSH terms

  • Machine Learning*
  • Metabolic Networks and Pathways
  • Metabolomics / methods*
  • Solanum lycopersicum / metabolism*