Background: Undigested components of the human diet affect the composition and function of the microorganisms present in the gastrointestinal tract. Techniques like metagenomic analyses allow researchers to study functional capacity, thus revealing the potential of using metagenomic data for developing objective biomarkers of food intake.
Objectives: As a continuation of our previous work using 16S and metabolomic datasets, we aimed to utilize a computationally intensive, multivariate, machine-learning approach to identify fecal KEGG (Kyoto encyclopedia of genes and genomes) Orthology (KO) categories as biomarkers that accurately classify food intake.
Methods: Data were aggregated from 5 controlled feeding studies that studied the individual impact of almonds, avocados, broccoli, walnuts, barley, and oats on the adult gastrointestinal microbiota. Deoxyribonucleic acid from preintervention and postintervention fecal samples underwent shotgun genomic sequencing. After preprocessing, sequences were aligned and functionally annotated with Double Index AlignMent Of Next-generation sequencing Data v2.0.11.149 and MEtaGenome ANalyzer v6.12.2, respectively. After the count normalization, the log of the fold change ratio for resulting KOs between pre- and postintervention of the treatment group against its corresponding control was utilized to conduct differential abundance analysis. Differentially abundant KOs were used to train machine-learning models examining potential biomarkers in both single-food and multi-food models.
Results: We identified differentially abundant KOs in the almond (n = 54), broccoli (n = 2474), and walnut (n = 732) groups (q < 0.20), which demonstrated classification accuracies of 80%, 87%, and 86% for the almond, broccoli, and walnut groups using a random forest model to classify food intake into each food group's respective treatment and control arms, respectively. The mixed-food random forest achieved 81% accuracy.
Conclusions: Our findings reveal promise in utilizing fecal metagenomics to objectively complement self-reported measures of food intake. Future research on various foods and dietary patterns will expand these exploratory analyses for eventual use in feeding study compliance and clinical settings.
Keywords: KEGG; dietary intake biomarkers; gastrointestinal microbiome; genomic sequencing; machine learning.
Copyright © 2023 American Society for Nutrition. Published by Elsevier Inc. All rights reserved.