Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces

Muhammad Tahir; Bu Yude; Tahir Mehmood; Saima Bashir; Zeeshan Ashraf

doi:10.1371/journal.pone.0316350

Block selection in multiblock partial least squares for modeling genotype-phenotype relations in Saccharomyces

PLoS One. 2025 Jan 2;20(1):e0316350. doi: 10.1371/journal.pone.0316350. eCollection 2025.

Authors

Muhammad Tahir¹, Bu Yude¹, Tahir Mehmood², Saima Bashir¹, Zeeshan Ashraf³

Affiliations

¹ School of Mathematics and Statistics, Shandong University, Weihai, Shandong, China.
² School of Natural Sciences (SNS), National University of Sciences and Technology (NUST), Islamabad, Pakistan.
³ Department of Mathematics and Statistics, Riphah International University, Islamabad, Pakistan.

Abstract

In data-based modeling, correlations between explanatory variables often lead to the formation of distinct gene blocks. This study focuses on identifying influential gene blocks and key variables within these blocks, with a particular application in mind: genotype-phenotype mapping in Saccharomyces. To overcome the challenges of a limited sample size, we use partial least squares (PLS). These gene blocks, which consist of combinations of genes, play a critical role in explaining phenotypic variations. Using partial least squares with multiple blocks, we propose a novel approach, weighted block importance on projection in partial least squares (BwIP-mbPLS), to identify influential gene blocks. Variable importance on projection is used to select significant genes within these blocks. Our study models copper chloride at 0.375mM and melibiose at 2% efficiency and rate in Saccharomyces cerevisiae yeast. Analysis based on silhouette index and total distance within clusters using k-means shows the classification of 5629 genes into 18 gene blocks. Remarkably, BwIP-mbPLS identifies 4 gene blocks on average and significantly improves the prediction of efficiency-based phenotypes. In contrast, traditional block importance in partial least squares projection identifies 6 gene blocks on average and shows comparable or better performance than BIP-mbPLS for rate-based phenotypes. Remarkably, most gene blocks contain fewer than 10 influential genes. Both proposed variants consistently outperform conventional approaches such as partial least squares and multi-block partial least squares in predicting phenotypes. These results highlight the potential of our methods for advancing data-based modeling and genotype-phenotype mapping.

Copyright: © 2025 Tahir et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Genetic Association Studies / methods
Genotype*
Least-Squares Analysis
Models, Genetic
Phenotype*
Saccharomyces / genetics
Saccharomyces cerevisiae* / genetics

Grants and funding

The author(s) received no specific funding for this work.