Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement

Deriu, Jan; Cieliebak, Mark

doi:10.18653/v1/W19-8654

Computer Science > Artificial Intelligence

arXiv:1909.12066 (cs)

[Submitted on 26 Sep 2019 (v1), last revised 25 Jun 2020 (this version, v2)]

Title:Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement

Authors:Jan Deriu, Mark Cieliebak

View PDF

Abstract:We present "AutoJudge", an automated evaluation method for conversational dialogue systems. The method works by first generating dialogues based on self-talk, i.e. dialogue systems talking to itself. Then, it uses human ratings on these dialogues to train an automated judgement model. Our experiments show that AutoJudge correlates well with the human ratings and can be used to automatically evaluate dialogue systems, even in deployed systems. In a second part, we attempt to apply AutoJudge to improve existing systems. This works well for re-ranking a set of candidate utterances. However, our experiments show that AutoJudge cannot be applied as reward for reinforcement learning, although the metric can distinguish good from bad dialogues. We discuss potential reasons, but state here already that this is still an open question for further research.

Comments:	8 Pages, To be published at the INLG 2019 converence
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1909.12066 [cs.AI]
	(or arXiv:1909.12066v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1909.12066
Journal reference:	Proceedings of the 12th International Conference on Natural Language Generation. 2019
Related DOI:	https://doi.org/10.18653/v1/W19-8654

Submission history

From: Jan Deriu [view email]
[v1] Thu, 26 Sep 2019 12:55:14 UTC (564 KB)
[v2] Thu, 25 Jun 2020 08:00:47 UTC (564 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2019-09

Change to browse by:

cs
cs.CL
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jan Deriu
Mark Cieliebak

export BibTeX citation

Computer Science > Artificial Intelligence

Title:Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Towards a Metric for Automated Conversational Dialogue System Evaluation and Improvement

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators