GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression

Victoria Bourgeais; Farida Zehraoui; Blaise Hanczar

doi:10.1093/bioinformatics/btac147

GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression

Bioinformatics. 2022 Apr 28;38(9):2504-2511. doi: 10.1093/bioinformatics/btac147.

Authors

Victoria Bourgeais¹, Farida Zehraoui¹, Blaise Hanczar¹

Affiliation

¹ Computer Science Department, IBISC, Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes 91020, France.

PMID: 35266505
DOI: 10.1093/bioinformatics/btac147

Abstract

Motivation: Medical care is becoming more and more specific to patients' needs due to the increased availability of omics data. The application to these data of sophisticated machine learning models, in particular deep learning (DL), can improve the field of precision medicine. However, their use in clinics is limited as their predictions are not accompanied by an explanation. The production of accurate and intelligible predictions can benefit from the inclusion of domain knowledge. Therefore, knowledge-based DL models appear to be a promising solution.

Results: In this article, we propose GraphGONet, where the Gene Ontology is encapsulated in the hidden layers of a new self-explaining neural network. Each neuron in the layers represents a biological concept, combining the gene expression profile of a patient and the information from its neighboring neurons. The experiments described in the article confirm that our model not only performs as accurately as the state-of-the-art (non-explainable ones) but also automatically produces stable and intelligible explanations composed of the biological concepts with the highest contribution. This feature allows experts to use our tool in a medical setting.

Availability and implementation: GraphGONet is freely available at https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet.git. The microarray dataset is accessible from the ArrayExpress database under the identifier E-MTAB-3732. The TCGA datasets can be downloaded from the Genomic Data Commons (GDC) data portal.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Gene Expression
Gene Ontology
Machine Learning*
Neural Networks, Computer*
Phenotype