在以最终到最终可培训、任务导向的对话系统中与人类教学和反馈对话学习 (Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems) - 专知论文

会员服务 ·

0

任务对话系统 · 学成 · INTERACT · 端到端 · 强化学习 ·

2018 年 4 月 18 日

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

翻译：在以最终到最终可培训、任务导向的对话系统中与人类教学和反馈对话学习

Bing Liu,Gokhan Tur,Dilek Hakkani-Tur,Pararth Shah,Larry Heck

from arxiv, To appear in NAACL 2018 as a long paper

In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent's capability in successfully completing a task.

翻译：在这项工作中,我们提出一种混合学习方法,用于通过在线用户互动来培训面向任务的对话系统。学习面向任务的对话的普及方法包括运用用户对受监督的培训前模式的反馈进行强化学习。这种学习方法的效率可能因离线培训与在线互动学习阶段之间对话状态分配不匹配而受到影响。为了应对这一挑战,我们提议一种混合模仿和强化学习方法,对话代理可以通过学习人类的教学和反馈,从与用户的互动中有效地学习。我们设计了一种基于神经网络的面向任务的对话代理,可以通过拟议的学习方法优化最终到终端。实验结果显示,我们的终端到终端对话代理可以通过用户教学的模仿学习来有效地从错误中学习。在模仿学习阶段后运用用户反馈来强化学习,进一步提高该代理成功完成任务的能力。

6

相关内容

任务对话系统

任务对话系统

【CVPR2020-Oral-中科院自动化所】元人脸识别，Learning Meta Face Recognition

【CVPR2020-Oral-中科院自动化所】元人脸识别，Learning Meta Face Recognition

专知会员服务

24+阅读 · 2020年3月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【Tom Kocmi博士论文】探讨迁移学习在神经机器翻译中的应用，Exploring Benefits of Transfer Learning in Neural Machine Translation

【Tom Kocmi博士论文】探讨迁移学习在神经机器翻译中的应用，Exploring Benefits of Transfer Learning in Neural Machine Translation

专知会员服务

10+阅读 · 2020年1月9日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

61+阅读 · 2019年8月26日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇聊天机器人（Chatbot）相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

【论文推荐】最新5篇聊天机器人（Chatbot）相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

专知

23+阅读 · 2018年1月30日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Learning Personalized End-to-End Goal-Oriented Dialog

Arxiv

4+阅读 · 2018年11月12日

Multi-turn Dialogue Response Generation in an Adversarial Learning Framework

Arxiv

4+阅读 · 2018年6月11日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

Hierarchical Reinforcement Learning with Deep Nested Agents

Arxiv

3+阅读 · 2018年5月18日

Can Neural Machine Translation be Improved with User Feedback?

Arxiv

3+阅读 · 2018年4月16日

Human Interaction with Recommendation Systems

Arxiv

6+阅读 · 2018年3月28日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

任务对话系统

相关VIP内容

【CVPR2020-Oral-中科院自动化所】元人脸识别，Learning Meta Face Recognition

【CVPR2020-Oral-中科院自动化所】元人脸识别，Learning Meta Face Recognition

专知会员服务

24+阅读 · 2020年3月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【Tom Kocmi博士论文】探讨迁移学习在神经机器翻译中的应用，Exploring Benefits of Transfer Learning in Neural Machine Translation

【Tom Kocmi博士论文】探讨迁移学习在神经机器翻译中的应用，Exploring Benefits of Transfer Learning in Neural Machine Translation

专知会员服务

10+阅读 · 2020年1月9日

【大规模数据系统，552页ppt】Large-scale Data Systems

【大规模数据系统，552页ppt】Large-scale Data Systems

专知会员服务

61+阅读 · 2019年12月21日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

【CCL 2019】ATT-第19期：文本生成 |Text Generation: From the Perspective of Interactive Inference （张家俊）

专知会员服务

43+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

61+阅读 · 2019年8月26日

热门VIP内容

开通专知VIP会员享更多权益服务

美陆军五大转型方向

一种Agent自主性风险评估框架 | 最新文献

实时无人机指令处理：一种面向无人机系统的大语言模型方法

基于动态知识图谱的人工智能代理自主研究周期 | 文献

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇聊天机器人（Chatbot）相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

【论文推荐】最新5篇聊天机器人（Chatbot）相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

专知

23+阅读 · 2018年1月30日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Learning a Matching Model with Co-teaching for Multi-turn Response Selection in Retrieval-based Dialogue Systems

Arxiv

6+阅读 · 2019年6月11日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

Learning Personalized End-to-End Goal-Oriented Dialog

Arxiv

4+阅读 · 2018年11月12日

Multi-turn Dialogue Response Generation in an Adversarial Learning Framework

Arxiv

4+阅读 · 2018年6月11日

End-to-end Active Object Tracking via Reinforcement Learning

Arxiv

3+阅读 · 2018年6月1日

Hierarchical Reinforcement Learning with Deep Nested Agents

Arxiv

3+阅读 · 2018年5月18日

Can Neural Machine Translation be Improved with User Feedback?

Arxiv

3+阅读 · 2018年4月16日

Human Interaction with Recommendation Systems

Arxiv

6+阅读 · 2018年3月28日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

微信扫码咨询专知VIP会员