Evaluation of gut microbiota predictive potential associated with phenotypic characteristics to identify multifactorial diseases

Gut Microbes. 2024 Jan-Dec;16(1):2297815. doi: 10.1080/19490976.2023.2297815. Epub 2024 Jan 18.

Abstract

Gut microbiota has been implicated in various clinical conditions, yet the substantial heterogeneity in gut microbiota research results necessitates a more sophisticated approach than merely identifying statistically different microbial taxa between healthy and unhealthy individuals. Our study seeks to not only select microbial taxa but also explore their synergy with phenotypic host variables to develop novel predictive models for specific clinical conditions.

Design: We assessed 50 healthy and 152 unhealthy individuals for phenotypic variables (PV) and gut microbiota (GM) composition by 16S rRNA gene sequencing. The entire modeling process was conducted in the R environment using the Random Forest algorithm. Model performance was assessed through ROC curve construction.

Results: We evaluated 52 bacterial taxa and pre-selected PV (p < 0.05) for their contribution to the final models. Across all diseases, the models achieved their best performance when GM and PV data were integrated. Notably, the integrated predictive models demonstrated exceptional performance for rheumatoid arthritis (AUC = 88.03%), type 2 diabetes (AUC = 96.96%), systemic lupus erythematosus (AUC = 98.4%), and type 1 diabetes (AUC = 86.19%).

Conclusion: Our findings underscore that the selection of bacterial taxa based solely on differences in relative abundance between groups is insufficient to serve as clinical markers. Machine learning techniques are essential for mitigating the considerable variability observed within gut microbiota. In our study, the use of microbial taxa alone exhibited limited predictive power for health outcomes, while the integration of phenotypic variables into predictive models substantially enhanced their predictive capabilities.

Keywords: 16S rRNA; Gut microbiota; phenotypic variables; prediction models; random forest.

Plain language summary

What is Already Known on this Subject? While the gut microbiota has been implicated as potential signatures or biomarkers for various clinical conditions, the establishment of causality in humans remains largely elusive.The role of the gut microbiota in maintaining the host organism’s proper physiological function is well-established, yet data regarding the composition of the gut microbiota in disease states often suffer from poor reproducibility.What Are the New Findings? Our study demonstrates that relying solely on differences in the relative abundance of bacterial taxa between groups falls short as a means of identifying clinical markers.We advocate the use of robust statistical tools, such as bootstrapping, to mitigate the substantial variability observed in gut microbiota studies, thereby enhancing the reproducibility of research findings.Our findings underscore the limited predictive power of microbial taxa in isolation for health outcomes.The integration of phenotypic variables into predictive models with gut microbiota significantly augments the ability to predict health outcomes.How This Study Might Advance Research Despite the growing enthusiasm for using gut microbiota as biomarkers for various clinical conditions, the lack of standardization throughout the research process impedes progress in this field.Our study emphasizes the necessity of rigorously testing predictions of clinical conditions based on gut microbiota using bootstrapping techniques, promoting greater reproducibility in research findings.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers
  • Diabetes Mellitus, Type 2*
  • Gastrointestinal Microbiome* / genetics
  • Humans
  • RNA, Ribosomal, 16S / genetics

Substances

  • RNA, Ribosomal, 16S
  • Biomarkers

Grants and funding

DCF, LC, IMGR, APAP received a scholarship from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brazil (CAPES) – Finance Code 001. This publication is partially supported by FAPEMIG (APQ-03423-18).