OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)

Jia, Fujian; Liu, Xin; Deng, Lixi; Gu, Jiwen; Pu, Chunchao; Bai, Tunan; Huang, Mengjiang; Lu, Yuanzhi; Liu, Kang

Computer Science > Computation and Language

arXiv:2402.16810 (cs)

[Submitted on 26 Feb 2024]

Title:OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)

Authors:Fujian Jia, Xin Liu, Lixi Deng, Jiwen Gu, Chunchao Pu, Tunan Bai, Mengjiang Huang, Yuanzhi Lu, Kang Liu

View PDF

Abstract:In the past year, there has been a growing trend in applying Large Language Models (LLMs) to the field of medicine, particularly with the advent of advanced language models such as ChatGPT developed by OpenAI. However, there is limited research on LLMs specifically addressing oncology-related queries. The primary aim of this research was to develop a specialized language model that demonstrates improved accuracy in providing advice related to oncology. We performed an extensive data collection of online question-answer interactions centered around oncology, sourced from reputable doctor-patient platforms. Following data cleaning and anonymization, a dataset comprising over 180K+ oncology-related conversations was established. The conversations were categorized and meticulously reviewed by field specialists and clinicians to ensure precision. Employing the LLaMA model and other selected open-source datasets, we conducted iterative fine-tuning to enhance the model's proficiency in basic medical conversation and specialized oncology knowledge. We observed a substantial enhancement in the model's understanding of genuine patient inquiries and its reliability in offering oncology-related advice through the utilization of real online question-answer interactions in the fine-tuning process. We release database and models to the research community (this https URL).

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.16810 [cs.CL]
	(or arXiv:2402.16810v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.16810

Submission history

From: Mengjiang Huang [view email]
[v1] Mon, 26 Feb 2024 18:33:13 UTC (722 KB)

Computer Science > Computation and Language

Title:OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators