Learning Dialog Policies from Weak Demonstrations

Gordon-Hall, Gabriel; Gorinski, Philip John; Cohen, Shay B.

Computer Science > Computation and Language

arXiv:2004.11054 (cs)

[Submitted on 23 Apr 2020 (v1), last revised 13 Aug 2020 (this version, v2)]

Title:Learning Dialog Policies from Weak Demonstrations

Authors:Gabriel Gordon-Hall, Philip John Gorinski, Shay B. Cohen

View PDF

Abstract:Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user's requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train expert demonstrators. We introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to overcome the domain gap between the datasets and the environment. Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.

Comments:	9 pages + 2 pages references + 1 page appendices, 6 figures, 2 tables, 1 algorithm, accepted as long paper at ACL2020
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2004.11054 [cs.CL]
	(or arXiv:2004.11054v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2004.11054

Submission history

From: Philip John Gorinski [view email]
[v1] Thu, 23 Apr 2020 10:22:16 UTC (420 KB)
[v2] Thu, 13 Aug 2020 16:02:03 UTC (420 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-04

Change to browse by:

cs
cs.LG
cs.NE

References & Citations

DBLP - CS Bibliography

listing | bibtex

Philip John Gorinski
Shay B. Cohen

export BibTeX citation

Computer Science > Computation and Language

Title:Learning Dialog Policies from Weak Demonstrations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Learning Dialog Policies from Weak Demonstrations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators