Protein Binding Site Representation in Latent Space

Frederieke Lohmann; Stephan Allenspach; Kenneth Atz; Carl C G Schiebroek; Jan A Hiss; Gisbert Schneider

doi:10.1002/minf.202400205

Protein Binding Site Representation in Latent Space

Mol Inform. 2024 Dec 18:e202400205. doi: 10.1002/minf.202400205. Online ahead of print.

Authors

Frederieke Lohmann¹, Stephan Allenspach¹, Kenneth Atz¹, Carl C G Schiebroek¹, Jan A Hiss^{1

2}, Gisbert Schneider^{1

2}

Affiliations

¹ Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zürich, Switzerland.
² Department of Biosystems Science and Engineering, ETH Zurich, Klingelbergstrasse 48, 4056, Basel, Switzerland.

PMID: 39692081
DOI: 10.1002/minf.202400205

Abstract

Interpretability and reliability of deep learning models are important for computer-based drug discovery. Aiming to understand feature perception by such a model, we investigate a graph neural network for affinity prediction of protein-ligand complexes. We assess a latent representation of ligand binding sites and investigate underlying geometric structure in this latent space and its relation to protein function. We introduce an automated computational pipeline for dimensionality reduction, clustering, hypothesis testing, and visualization of latent space. The results indicate that the learned protein latent space is inherently structured and not randomly distributed. Several of the identified protein binding site clusters in latent space correspond to functional protein families. Ligand size was found to be a determinant of cluster geometry. The computational pipeline proved applicable to latent space analysis and interpretation and can be adapted to work for different datasets and deep learning models.

Keywords: drug discovery; interpretability; machine learning; neural network; protein structure.