Hierarchical Reinforcement Learning for Open-Domain Dialog

Saleh, Abdelrhman; Jaques, Natasha; Ghandeharioun, Asma; Shen, Judy Hanwen; Picard, Rosalind

Computer Science > Machine Learning

arXiv:1909.07547 (cs)

[Submitted on 17 Sep 2019 (v1), last revised 31 Dec 2019 (this version, v3)]

Title:Hierarchical Reinforcement Learning for Open-Domain Dialog

Authors:Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Rosalind Picard

View PDF

Abstract:Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning, VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements -- in terms of both human evaluation and automatic metrics -- over state-of-the-art dialog models, including Transformers.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1909.07547 [cs.LG]
	(or arXiv:1909.07547v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.07547

Submission history

From: Abdelrhman Saleh [view email]
[v1] Tue, 17 Sep 2019 01:57:18 UTC (1,106 KB)
[v2] Wed, 18 Sep 2019 14:25:28 UTC (1,107 KB)
[v3] Tue, 31 Dec 2019 21:23:04 UTC (1,227 KB)

Computer Science > Machine Learning

Title:Hierarchical Reinforcement Learning for Open-Domain Dialog

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Hierarchical Reinforcement Learning for Open-Domain Dialog

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators