Genotypic-phenotypic landscape computation based on first principle and deep learning

Yuexing Liu; Yao Luo; Xin Lu; Hao Gao; Ruikun He; Xin Zhang; Xuguang Zhang; Yixue Li

doi:10.1093/bib/bbae191

Genotypic-phenotypic landscape computation based on first principle and deep learning

Brief Bioinform. 2024 Mar 27;25(3):bbae191. doi: 10.1093/bib/bbae191.

Authors

Yuexing Liu¹, Yao Luo², Xin Lu¹, Hao Gao³, Ruikun He³, Xin Zhang³, Xuguang Zhang⁴, Yixue Li^{1

3

5

6

7

8

9}

Affiliations

¹ Guangzhou Laboratory, Guangzhou, Guangdong Province 510005, China.
² National University of Singapore, 21 Lower Kent Ridge Road, 119077, Singapore.
³ Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200030, China.
⁴ Mengniu Institute of Nutrition Science, Shanghai 200126, China.
⁵ GZMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macau Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Medical University, Guangzhou 511436, China.
⁶ Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China.
⁷ School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
⁸ Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200433, China.
⁹ Shanghai Institute for Biomedical and Pharmaceutical Technologies, Shanghai 200032, China.

Abstract

The relationship between genotype and fitness is fundamental to evolution, but quantitatively mapping genotypes to fitness has remained challenging. We propose the Phenotypic-Embedding theorem (P-E theorem) that bridges genotype-phenotype through an encoder-decoder deep learning framework. Inspired by this, we proposed a more general first principle for correlating genotype-phenotype, and the P-E theorem provides a computable basis for the application of first principle. As an application example of the P-E theorem, we developed the Co-attention based Transformer model to bridge Genotype and Fitness model, a Transformer-based pre-train foundation model with downstream supervised fine-tuning that can accurately simulate the neutral evolution of viruses and predict immune escape mutations. Accordingly, following the calculation path of the P-E theorem, we accurately obtained the basic reproduction number (${R}_0$) of SARS-CoV-2 from first principles, quantitatively linked immune escape to viral fitness and plotted the genotype-fitness landscape. The theoretical system we established provides a general and interpretable method to construct genotype-phenotype landscapes, providing a new paradigm for studying theoretical and computational biology.

Keywords: SARS-CoV-2; deep learning; genotype-fitness landscape; immune escape; interpretability; the relative basic reproduction number (R0).

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
COVID-19* / genetics
COVID-19* / immunology
COVID-19* / virology
Computational Biology / methods
Deep Learning*
Genetic Fitness
Genotype*
Humans
Phenotype*
SARS-CoV-2* / genetics
SARS-CoV-2* / immunology

Abstract

Publication types

MeSH terms

Grants and funding