Genotypic-phenotypic landscape computation based on first principle and deep learning

Brief Bioinform. 2024 Mar 27;25(3):bbae191. doi: 10.1093/bib/bbae191.

Abstract

The relationship between genotype and fitness is fundamental to evolution, but quantitatively mapping genotypes to fitness has remained challenging. We propose the Phenotypic-Embedding theorem (P-E theorem) that bridges genotype-phenotype through an encoder-decoder deep learning framework. Inspired by this, we proposed a more general first principle for correlating genotype-phenotype, and the P-E theorem provides a computable basis for the application of first principle. As an application example of the P-E theorem, we developed the Co-attention based Transformer model to bridge Genotype and Fitness model, a Transformer-based pre-train foundation model with downstream supervised fine-tuning that can accurately simulate the neutral evolution of viruses and predict immune escape mutations. Accordingly, following the calculation path of the P-E theorem, we accurately obtained the basic reproduction number (${R}_0$) of SARS-CoV-2 from first principles, quantitatively linked immune escape to viral fitness and plotted the genotype-fitness landscape. The theoretical system we established provides a general and interpretable method to construct genotype-phenotype landscapes, providing a new paradigm for studying theoretical and computational biology.

Keywords: SARS-CoV-2; deep learning; genotype-fitness landscape; immune escape; interpretability; the relative basic reproduction number (R0).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • COVID-19* / genetics
  • COVID-19* / immunology
  • COVID-19* / virology
  • Computational Biology / methods
  • Deep Learning*
  • Genetic Fitness
  • Genotype*
  • Humans
  • Phenotype*
  • SARS-CoV-2* / genetics
  • SARS-CoV-2* / immunology