Understanding structure-guided variant effect predictions using 3D convolutional neural networks

Gayatri Ramakrishnan; Coos Baakman; Stephan Heijl; Bas Vroling; Ragna van Horck; Jeffrey Hiraki; Li C Xue; Martijn A Huynen

doi:10.3389/fmolb.2023.1204157

Understanding structure-guided variant effect predictions using 3D convolutional neural networks

Front Mol Biosci. 2023 Jul 5:10:1204157. doi: 10.3389/fmolb.2023.1204157. eCollection 2023.

Authors

Gayatri Ramakrishnan¹, Coos Baakman¹, Stephan Heijl², Bas Vroling², Ragna van Horck³, Jeffrey Hiraki³, Li C Xue¹, Martijn A Huynen¹

Affiliations

¹ Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands.
² Bio-Prodict, Nijmegen, Netherlands.
³ Vartion, Malden, Netherlands.

Abstract

Predicting pathogenicity of missense variants in molecular diagnostics remains a challenge despite the available wealth of data, such as evolutionary information, and the wealth of tools to integrate that data. We describe DeepRank-Mut, a configurable framework designed to extract and learn from physicochemically relevant features of amino acids surrounding missense variants in 3D space. For each variant, various atomic and residue-level features are extracted from its structural environment, including sequence conservation scores of the surrounding amino acids, and stored in multi-channel 3D voxel grids which are then used to train a 3D convolutional neural network (3D-CNN). The resultant model gives a probabilistic estimate of whether a given input variant is disease-causing or benign. We find that the performance of our 3D-CNN model, on independent test datasets, is comparable to other widely used resources which also combine sequence and structural features. Based on the 10-fold cross-validation experiments, we achieve an average accuracy of 0.77 on the independent test datasets. We discuss the contribution of the variant neighborhood in the model's predictive power, in addition to the impact of individual features on the model's performance. Two key features: evolutionary information of residues in the variant neighborhood and their solvent accessibilities were observed to influence the predictions. We also highlight how predictions are impacted by the underlying disease mechanisms of missense mutations and offer insights into understanding these to improve pathogenicity predictions. Our study presents aspects to take into consideration when adopting deep learning approaches for protein structure-guided pathogenicity predictions.

Keywords: 3D CNN; gain-of-function; loss-of-function; machine learning; missense variant; protein structure.

Grants and funding

This research was supported by the Europees Fonds voor Regionale Ontwikkeling (EFRO) (R0005582). LX acknowledges support from Hypatia Fellowship from RadboudUMC (Rv819.52706). The work was carried out on the National Computer Facilities (NWO-2021.047).