AlzDiscovery: A computational tool to identify Alzheimer's disease-causing missense mutations using protein structure information

Qisheng Pan; Georgina Becerra Parra; Yoochan Myung; Stephanie Portelli; Thanh Binh Nguyen; David B Ascher

doi:10.1002/pro.5147

AlzDiscovery: A computational tool to identify Alzheimer's disease-causing missense mutations using protein structure information

Protein Sci. 2024 Oct;33(10):e5147. doi: 10.1002/pro.5147.

Authors

Qisheng Pan^{1

2}, Georgina Becerra Parra^{1

2}, Yoochan Myung^{1

2}, Stephanie Portelli^{1

2}, Thanh Binh Nguyen^{1

2}, David B Ascher^{1

2}

Affiliations

¹ The Australian Centre for Ecogenomics, School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, Australia.
² Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Australia.

Abstract

Alzheimer's disease (AD) is one of the most common forms of dementia and neurodegenerative diseases, characterized by the formation of neuritic plaques and neurofibrillary tangles. Many different proteins participate in this complicated pathogenic mechanism, and missense mutations can alter the folding and functions of these proteins, significantly increasing the risk of AD. However, many methods to identify AD-causing variants did not consider the effect of mutations from the perspective of a protein three-dimensional environment. Here, we present a machine learning-based analysis to classify the AD-causing mutations from their benign counterparts in 21 AD-related proteins leveraging both sequence- and structure-based features. Using computational tools to estimate the effect of mutations on protein stability, we first observed a bias of the pathogenic mutations with significant destabilizing effects on family AD-related proteins. Combining this insight, we built a generic predictive model, and improved the performance by tuning the sample weights in the training process. Our final model achieved the performance on area under the receiver operating characteristic curve up to 0.95 in the blind test and 0.70 in an independent clinical validation, outperforming all the state-of-the-art methods. Feature interpretation indicated that the hydrophobic environment and polar interaction contacts were crucial to the decision on pathogenic phenotypes of missense mutations. Finally, we presented a user-friendly web server, AlzDiscovery, for researchers to browse the predicted phenotypes of all possible missense mutations on these 21 AD-related proteins. Our study will be a valuable resource for AD screening and the development of personalized treatment.

Keywords: AlphaFold2; Alzheimer's disease; machine learning; missense mutation; protein structure; web server.

MeSH terms

Alzheimer Disease* / genetics
Computational Biology / methods
Humans
Machine Learning
Mutation, Missense*
Protein Conformation
Software

Grants and funding

GNT1174405/National Health and Medical Research Council