A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Alexander R Pelletier; Dylan Steinecke; Dibakar Sigdel; Irsyad Adam; J Harry Caufield; Vladimir Guevara-Gonzalez; Joseph Ramirez; Aarushi Verma; Kaitlyn Bali; Katherine Downs; Wei Wang; Alex Bui; Peipei Ping

doi:10.3791/65084

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

J Vis Exp. 2023 Oct 13:(200). doi: 10.3791/65084.

Authors

Affiliations

¹ Department of Physiology, UCLA School of Medicine; Scalable Analytics Institute (ScAi) at Department of Computer Science, UCLA School of Engineering; NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA; [email protected].
² Department of Physiology, UCLA School of Medicine; NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA; Medical Informatics, University of California at Los Angeles (UCLA).
³ Department of Physiology, UCLA School of Medicine.
⁴ Department of Physiology, UCLA School of Medicine; Scalable Analytics Institute (ScAi) at Department of Computer Science, UCLA School of Engineering; NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA.
⁵ NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA; Medical Informatics, University of California at Los Angeles (UCLA).
⁶ Department of Physiology, UCLA School of Medicine; Scalable Analytics Institute (ScAi) at Department of Computer Science, UCLA School of Engineering; NIH BRIDGE2AI Center at UCLA & NHLBI Integrated Cardiovascular Data Science Training Program, UCLA; Medical Informatics, University of California at Los Angeles (UCLA); Department of Medicine (Cardiology), UCLA School of Medicine.

PMID: 37902366
DOI: 10.3791/65084

Abstract

The rapidly increasing and vast quantities of biomedical reports, each containing numerous entities and rich information, represent a rich resource for biomedical text-mining applications. These tools enable investigators to integrate, conceptualize, and translate these discoveries to uncover new insights into disease pathology and therapeutics. In this protocol, we present CaseOLAP LIFT, a new computational pipeline to investigate cellular components and their disease associations by extracting user-selected information from text datasets (e.g., biomedical literature). The software identifies sub-cellular proteins and their functional partners within disease-relevant documents. Additional disease-relevant documents are identified via the software's label imputation method. To contextualize the resulting protein-disease associations and to integrate information from multiple relevant biomedical resources, a knowledge graph is automatically constructed for further analyses. We present one use case with a corpus of ~34 million text documents downloaded online to provide an example of elucidating the role of mitochondrial proteins in distinct cardiovascular disease phenotypes using this method. Furthermore, a deep learning model was applied to the resulting knowledge graph to predict previously unreported relationships between proteins and disease, resulting in 1,583 associations with predicted probabilities >0.90 and with an area under the receiver operating characteristic curve (AUROC) of 0.91 on the test set. This software features a highly customizable and automated workflow, with a broad scope of raw data available for analysis; therefore, using this method, protein-disease associations can be identified with enhanced reliability within a text corpus.

A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding