Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

Mendonça, John; Pereira, Patrícia; Moniz, Helena; Carvalho, João Paulo; Lavie, Alon; Trancoso, Isabel

Computer Science > Computation and Language

arXiv:2308.16797 (cs)

[Submitted on 31 Aug 2023 (v1), last revised 8 Sep 2023 (this version, v2)]

Title:Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

Authors:John Mendonça, Patrícia Pereira, Helena Moniz, João Paulo Carvalho, Alon Lavie, Isabel Trancoso

View PDF

Abstract:Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English. At the same time, ensuring metrics are invariant to semantically similar responses is also an overlooked topic. In order to achieve the desired properties of robustness and multilinguality for dialogue evaluation metrics, we propose a novel framework that takes advantage of the strengths of current evaluation models with the newly-established paradigm of prompting Large Language Models (LLMs). Empirical results show our framework achieves state of the art results in terms of mean Spearman correlation scores across several benchmarks and ranks first place on both the Robust and Multilingual tasks of the DSTC11 Track 4 "Automatic Evaluation Metrics for Open-Domain Dialogue Systems", proving the evaluation capabilities of prompted LLMs.

Comments:	DSTC11 best paper for Track 4
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2308.16797 [cs.CL]
	(or arXiv:2308.16797v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2308.16797

Submission history

From: John Mendonca [view email]
[v1] Thu, 31 Aug 2023 15:19:28 UTC (163 KB)
[v2] Fri, 8 Sep 2023 11:24:06 UTC (163 KB)

Computer Science > Computation and Language

Title:Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators