PreLect: Prevalence leveraged consistent feature selection decodes microbial signatures across cohorts

NPJ Biofilms Microbiomes. 2025 Jan 3;11(1):3. doi: 10.1038/s41522-024-00598-2.

Abstract

The intricate nature of microbiota sequencing data-high dimensionality and sparsity-presents a challenge in identifying informative and reproducible microbial features for both research and clinical applications. Addressing this, we introduce PreLect, an innovative feature selection framework that harnesses microbes' prevalence to facilitate consistent selection in sparse microbiota data. Upon rigorous benchmarking against established feature selection methodologies across 42 microbiome datasets, PreLect demonstrated superior classification capabilities compared to statistical methods and outperformed machine learning-based methods by selecting features with greater prevalence and abundance. A significant strength of PreLect lies in its ability to reliably identify reproducible microbial features across varied cohorts. Applied to colorectal cancer, PreLect identifies key microbes and highlights crucial pathways, such as lipopolysaccharide and glycerophospholipid biosynthesis, in cancer progression. This case study exemplifies PreLect's utility in discerning clinically relevant microbial signatures. In summary, PreLect's accuracy and robustness make it a significant advancement in the analysis of complex microbiota data.