Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy

Yunxin Xu; Di Liu; Haipeng Gong

doi:10.1038/s43588-024-00716-2

Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy

Nat Comput Sci. 2024 Nov;4(11):840-850. doi: 10.1038/s43588-024-00716-2. Epub 2024 Oct 25.

Authors

Yunxin Xu^{1

2}, Di Liu^{1

2}, Haipeng Gong^{3

4}

Affiliations

¹ MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.
² Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China.
³ MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China. [email protected].
⁴ Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China. [email protected].

PMID: 39455825
DOI: 10.1038/s43588-024-00716-2

Abstract

Accurate prediction of protein mutation effects is of great importance in protein engineering and design. Here we propose GeoStab-suite, a suite of three geometric learning-based models-GeoFitness, GeoDDG and GeoDTm-for the prediction of fitness score, ΔΔG and ΔT_m of a protein upon mutations, respectively. GeoFitness engages a specialized loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning database. To further improve the downstream tasks of ΔΔG and ΔT_m prediction, the encoder of GeoFitness is reutilized as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lacking sufficient labeled data. This pre-training strategy, in combination with data expansion, markedly improves model performance and generalizability. In the benchmark test, GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient.

MeSH terms

Algorithms
Computational Biology / education
Computational Biology / methods
Databases, Protein
Machine Learning
Mutation*
Protein Stability*
Proteins* / chemistry
Proteins* / genetics

Substances

Proteins

Grants and funding

32171243/National Natural Science Foundation of China (National Science Foundation of China)