Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models

Kao, Chang-Sheng; Chen, Yun-Nung

Computer Science > Computation and Language

arXiv:2407.03615 (cs)

[Submitted on 4 Jul 2024]

Title:Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models

Authors:Chang-Sheng Kao, Yun-Nung Chen

View PDF HTML (experimental)

Abstract:Recent advancements in dialogue systems have highlighted the significance of integrating multimodal responses, which enable conveying ideas through diverse modalities rather than solely relying on text-based interactions. This enrichment not only improves overall communicative efficacy but also enhances the quality of conversational experiences. However, existing methods for dialogue-to-image retrieval face limitations due to the constraints of pre-trained vision language models (VLMs) in comprehending complex dialogues accurately. To address this, we present a novel approach leveraging the robust reasoning capabilities of large language models (LLMs) to generate precise dialogue-associated visual descriptors, facilitating seamless connection with images. Extensive experiments conducted on benchmark data validate the effectiveness of our proposed approach in deriving concise and accurate visual descriptors, leading to significant enhancements in dialogue-to-image retrieval performance. Furthermore, our findings demonstrate the method's generalizability across diverse visual cues, various LLMs, and different datasets, underscoring its practicality and potential impact in real-world applications.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.03615 [cs.CL]
	(or arXiv:2407.03615v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.03615

Submission history

From: Chang-Sheng Kao [view email]
[v1] Thu, 4 Jul 2024 03:50:30 UTC (11,415 KB)

Computer Science > Computation and Language

Title:Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators