Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces

PLoS One. 2025 Jan 2;20(1):e0316350. doi: 10.1371/journal.pone.0316350. eCollection 2025.

Abstract

In data-based modeling, correlations between explanatory variables often lead to the formation of distinct gene blocks. This study focuses on identifying influential gene blocks and key variables within these blocks, with a particular application in mind: genotype-phenotype mapping in Saccharomyces. To overcome the challenges of a limited sample size, we use partial least squares (PLS). These gene blocks, which consist of combinations of genes, play a critical role in explaining phenotypic variations. Using partial least squares with multiple blocks, we propose a novel approach, weighted block importance on projection in partial least squares (BwIP-mbPLS), to identify influential gene blocks. Variable importance on projection is used to select significant genes within these blocks. Our study models copper chloride at 0.375mM and melibiose at 2% efficiency and rate in Saccharomyces cerevisiae yeast. Analysis based on silhouette index and total distance within clusters using k-means shows the classification of 5629 genes into 18 gene blocks. Remarkably, BwIP-mbPLS identifies 4 gene blocks on average and significantly improves the prediction of efficiency-based phenotypes. In contrast, traditional block importance in partial least squares projection identifies 6 gene blocks on average and shows comparable or better performance than BIP-mbPLS for rate-based phenotypes. Remarkably, most gene blocks contain fewer than 10 influential genes. Both proposed variants consistently outperform conventional approaches such as partial least squares and multi-block partial least squares in predicting phenotypes. These results highlight the potential of our methods for advancing data-based modeling and genotype-phenotype mapping.

MeSH terms

  • Genetic Association Studies / methods
  • Genotype*
  • Least-Squares Analysis
  • Models, Genetic
  • Phenotype*
  • Saccharomyces / genetics
  • Saccharomyces cerevisiae* / genetics

Grants and funding

The author(s) received no specific funding for this work.