An extension of the Walsh-Hadamard transform to calculate and model epistasis in genetic landscapes of arbitrary shape and complexity

PLoS Comput Biol. 2024 May 28;20(5):e1012132. doi: 10.1371/journal.pcbi.1012132. eCollection 2024 May.

Abstract

Accurate models describing the relationship between genotype and phenotype are necessary in order to understand and predict how mutations to biological sequences affect the fitness and evolution of living organisms. The apparent abundance of epistasis (genetic interactions), both between and within genes, complicates this task and how to build mechanistic models that incorporate epistatic coefficients (genetic interaction terms) is an open question. The Walsh-Hadamard transform represents a rigorous computational framework for calculating and modeling epistatic interactions at the level of individual genotypic values (known as genetical, biological or physiological epistasis), and can therefore be used to address fundamental questions related to sequence-to-function encodings. However, one of its main limitations is that it can only accommodate two alleles (amino acid or nucleotide states) per sequence position. In this paper we provide an extension of the Walsh-Hadamard transform that allows the calculation and modeling of background-averaged epistasis (also known as ensemble epistasis) in genetic landscapes with an arbitrary number of states per position (20 for amino acids, 4 for nucleotides, etc.). We also provide a recursive formula for the inverse matrix and then derive formulae to directly extract any element of either matrix without having to rely on the computationally intensive task of constructing or inverting large matrices. Finally, we demonstrate the utility of our theory by using it to model epistasis within both simulated and empirical multiallelic fitness landscapes, revealing that both pairwise and higher-order genetic interactions are enriched between physically interacting positions.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Epistasis, Genetic* / genetics
  • Genotype
  • Models, Genetic*
  • Mutation / genetics

Grants and funding

A.J.F. was funded by a Ramón y Cajal fellowship (RYC2021-033375-I funded by MCIN/AEI/10.13039/501100011033 and European Union NextGenerationEU/PRTR). B.L. and this project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (883742), the Spanish Ministry of Science and Innovation (LCF/ PR/HR21/52410004, EMBL Partnership), the Plan Estatal de Investigación Científica y Técnica y de Innovación (PID2020-118723GB-I00 / AEI / 10.13039/501100011033), the Bettencourt Schueller Foundation, the AXA Research Fund (AXA professor of Risk prediction in age-related diseases) and the Secretariat of Universities and Research, Ministry of Enterprise and Knowledge of the Government of Catalonia and the European Social Funds (2017 SGR 1322). V.M.P. was funded by the Spanish Ministry of Science and Innovation (PGC2018-100941-A- I00 AEI/FEDER, UE and PID2021-128976NB-I00 funded by MICIU/ AEI / 10.13039/501100011033 / FEDER, UE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.