A data-driven modeling framework for mapping genotypes to synthetic microbial community functions

bioRxiv [Preprint]. 2025 Jan 4:2025.01.04.631316. doi: 10.1101/2025.01.04.631316.

Abstract

Microbial communities play a central role in transforming environments across Earth, driving both physical and chemical changes. By harnessing these capabilities, synthetic microbial communities, assembled from the bottom up, offer valuable insights into the mechanisms that govern community functions. These communities can also be tailored to produce desired outcomes, such as the synthesis of health-related metabolites or nitrogen fixation to improve plant productivity. Widely used computational models predict synthetic community functions using species abundances as inputs, making it impossible to predict the effects of species not included in the training data. We bridge this gap using a data-driven community genotype function (dCGF) model. By lifting the representation of each species to a high-dimensional genetic feature space, dCGF learns a mapping from community genetic feature matrices to community functions. We demonstrate that dCGF can accurately predict communities in a fixed environmental context that are composed in part or entirely from new species with known genetic features. In addition, dCGF facilitates the identification of species roles for a community function and hypotheses about how specific genetic features influence community functions. In sum, dCGF provides a new data-driven avenue for modeling synthetic microbial communities using genetic information, which could empower model-driven design of microbial communities.

Publication types

  • Preprint