Megavariate Methods Capture Complex Genotype-by-Environment Interactions

Genetics. 2024 Nov 4:iyae179. doi: 10.1093/genetics/iyae179. Online ahead of print.

Abstract

Genomic prediction models that capture genotype-by-environment interaction are useful for predicting site-specific performance by leveraging information among related individuals and correlated environments, but implementing such models is computationally challenging. This study describes the algorithm of these scalable approaches, including two models with latent representations of genotype-by-environment interactions, namely MegaLMM and MegaSEM, and an efficient multivariate mixed model solver, namely PEGS, fitting different covariance structures (unstructured, XFA, HCS). Accuracy and runtime are benchmarked on simulated scenarios with varying numbers of genotypes and environments. MegaLMM and PEGS-based XFA and HCS models provided the highest accuracy under sparse testing with 100 testing environments. PEGS-based unstructured model was orders of magnitude faster than REML-based multivariate GBLUP while providing the same accuracy. MegaSEM provided the lowest runtime, fitting a model with 200 traits and 20,000 individuals in approximately 5 minutes, and a model with 2,000 traits and 2,000 individuals in less than 3 minutes. With the G2F data, the most accurate predictions were attained with the univariate model fitted across environments and by averaging environment-level GEBVs from models with HCS and XFA covariance structures.

Keywords: Accuracy; Genomic Prediction; Matrix Decomposition; Multivariate Models.