Improving genome-scale metabolic models of incomplete genomes with deep learning

iScience. 2024 Nov 7;27(12):111349. doi: 10.1016/j.isci.2024.111349. eCollection 2024 Dec 20.

Abstract

Deciphering microbial metabolism is essential for understanding ecosystem functions. Genome-scale metabolic models (GSMMs) predict metabolic traits from genomic data, but constructing GSMMs for uncultured bacteria is challenging due to incomplete metagenome-assembled genomes, resulting in many gaps. We introduce the deep neural network guided imputation of reactomes (DNNGIOR), which uses AI to improve gap-filling by learning from the presence and absence of metabolic reactions across diverse bacterial genomes. Key factors for prediction accuracy are: (1) reaction frequency across all bacteria and (2) phylogenetic distance of the query to the training genomes. DNNGIOR predictions achieve an average F1 score of 0.85 for reactions present in over 30% of training genomes. DNNGIOR guided gap-filling was 14 times more accurate for draft reconstructions and 2-9 times for curated models than unweighted gap-filling.

Keywords: Biocomputational method; Computational Bioinformatics; Genomic analysis; Microbial genomics.