Knowledge graph-based thought: a knowledge graph-enhanced LLM framework for pan-cancer question answering

Yichun Feng; Lu Zhou; Chao Ma; Yikai Zheng; Ruikun He; Yixue Li

doi:10.1093/gigascience/giae082

Knowledge graph-based thought: a knowledge graph-enhanced LLM framework for pan-cancer question answering

Gigascience. 2025 Jan 6:14:giae082. doi: 10.1093/gigascience/giae082.

Authors

Yichun Feng^{1

2}, Lu Zhou², Chao Ma³, Yikai Zheng², Ruikun He^{4

5}, Yixue Li^{1

2}

Affiliations

¹ Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, 310024 Hangzhou, China.
² Guangzhou National Laboratory, Guangzhou International Bio Island, 510005 Guangzhou, China.
³ Smartquerier Gene Technology (Shanghai) Co., Ltd., 200100 Shanghai, China.
⁴ BYHEALTH Institute of Nutrition & Health, 510663 Guangzhou, China.
⁵ Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences Shanghai, 200030 Shanghai, China.

PMID: 39775838
DOI: 10.1093/gigascience/giae082

Abstract

Background: In recent years, large language models (LLMs) have shown promise in various domains, notably in biomedical sciences. However, their real-world application is often limited by issues like erroneous outputs and hallucinatory responses.

Results: We developed the knowledge graph-based thought (KGT) framework, an innovative solution that integrates LLMs with knowledge graphs (KGs) to improve their initial responses by utilizing verifiable information from KGs, thus significantly reducing factual errors in reasoning. The KGT framework demonstrates strong adaptability and performs well across various open-source LLMs. Notably, KGT can facilitate the discovery of new uses for existing drugs through potential drug-cancer associations and can assist in predicting resistance by analyzing relevant biomarkers and genetic mechanisms. To evaluate the knowledge graph question answering task within biomedicine, we utilize a pan-cancer knowledge graph to develop a pan-cancer question answering benchmark, named pan-cancer question answering.

Conclusions: The KGT framework substantially improves the accuracy and utility of LLMs in the biomedical field. This study serves as a proof of concept, demonstrating its exceptional performance in biomedical question answering.

Keywords: knowledge graph question answering; large language model; pan-cancer knowledge graph; prompt engineering.

MeSH terms

Algorithms
Computational Biology / methods
Humans
Knowledge Bases
Neoplasms* / genetics

Abstract

MeSH terms

Grants and funding