" 用户想要什么? " 获取信息促进等级对话政策优化 (What Does The User Want? Information Gain for Hierarchical Dialogue Policy Optimisation) - 专知论文

会员服务 ·

0

任务对话系统 · INFORMS · 学成 · state-of-the-art · Integration ·

2021 年 9 月 15 日

What Does The User Want? Information Gain for Hierarchical Dialogue Policy Optimisation

翻译：" 用户想要什么? " 获取信息促进等级对话政策优化

Christian Geishauser,Songbo Hu,Hsien-chin Lin,Nurul Lubis,Michael Heck,Shutong Feng,Carel van Niekerk,Milica Gašić

The dialogue management component of a task-oriented dialogue system is typically optimised via reinforcement learning (RL). Optimisation via RL is highly susceptible to sample inefficiency and instability. The hierarchical approach called Feudal Dialogue Management takes a step towards more efficient learning by decomposing the action space. However, it still suffers from instability due to the reward only being provided at the end of the dialogue. We propose the usage of an intrinsic reward based on information gain to address this issue. Our proposed reward favours actions that resolve uncertainty or query the user whenever necessary. It enables the policy to learn how to retrieve the users' needs efficiently, which is an integral aspect in every task-oriented conversation. Our algorithm, which we call FeudalGain, achieves state-of-the-art results in most environments of the PyDial framework, outperforming much more complex approaches. We confirm the sample efficiency and stability of our algorithm through experiments in simulation and a human trial.

翻译：以任务为导向的对话系统的对话管理部分通常是通过强化学习优化的。通过强化学习优化RL的优化极易受到低效率和不稳定的抽样影响。称为Feudal对话管理(Feudal 对话管理)的等级式方法通过分解行动空间,朝着更高效的学习迈出了一步。然而,由于在对话结束时才提供奖励,它仍然不稳定。我们提议利用基于信息收益的内在奖励来解决这一问题。我们提议的奖励有利于解决不确定性的行动,或在必要时询问用户。它使得政策能够学习如何有效地满足用户的需要,这是每个任务性对话的一个不可分割的方面。我们称之为FeudalGain的算法在PyDial框架的大多数环境中都取得了最先进的结果,其表现远超过更复杂的方法。我们通过模拟试验和人类试验确认了我们算法的抽样效率和稳定性。

0

相关内容

任务对话系统

任务对话系统

【Google-WWW2020】会话域探索的动态组合， Conversational Domain Exploration

专知会员服务

10+阅读 · 2020年3月22日

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

专知会员服务

22+阅读 · 2020年1月15日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

专知会员服务

33+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

专知

14+阅读 · 2018年6月11日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Hierarchies of Planning and Reinforcement Learning for Robot Navigation

Arxiv

0+阅读 · 2021年11月5日

Cooperative Transportation of UAVs Without Inter-UAV Communication

Arxiv

0+阅读 · 2021年11月5日

Pull or Wait: How to Optimize Query Age of Information

Arxiv

0+阅读 · 2021年11月4日

Hierarchically Integrated Models: Learning to Navigate from Heterogeneous Robots

Arxiv

0+阅读 · 2021年11月4日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Arxiv

11+阅读 · 2018年6月16日

Hierarchical Reinforcement Learning with Deep Nested Agents

Arxiv

3+阅读 · 2018年5月18日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

Towards an Engine for Lifelong Interactive Knowledge Learning in Human-Machine Conversations

Arxiv

5+阅读 · 2018年2月16日

VIP会员

文章信息

相关主题

任务对话系统

state-of-the-art

相关VIP内容

【Google-WWW2020】会话域探索的动态组合， Conversational Domain Exploration

专知会员服务

10+阅读 · 2020年3月22日

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

【2020密歇根大学论文】基于学习的序列决策算法的公平性综述论文，Fairness in Learning-Based Sequential Decision Algorithms: A Survey

专知会员服务

22+阅读 · 2020年1月15日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

【RecSys 2019报告】基于对话的推荐（Context Adaptation with Session‐based Recommenders）

专知会员服务

33+阅读 · 2019年9月20日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

【论文推荐】最新六篇图像检索相关论文—多模态反馈、二值约束深度哈希、绘制草图、对话交互式、多目标图像检索

专知

14+阅读 · 2018年6月11日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Hierarchies of Planning and Reinforcement Learning for Robot Navigation

Arxiv

0+阅读 · 2021年11月5日

Cooperative Transportation of UAVs Without Inter-UAV Communication

Arxiv

0+阅读 · 2021年11月5日

Pull or Wait: How to Optimize Query Age of Information

Arxiv

0+阅读 · 2021年11月4日

Hierarchically Integrated Models: Learning to Navigate from Heterogeneous Robots

Arxiv

0+阅读 · 2021年11月4日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Arxiv

11+阅读 · 2018年6月16日

Hierarchical Reinforcement Learning with Deep Nested Agents

Arxiv

3+阅读 · 2018年5月18日

Dialog-based Interactive Image Retrieval

Arxiv

5+阅读 · 2018年5月1日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

Towards an Engine for Lifelong Interactive Knowledge Learning in Human-Machine Conversations

Arxiv

5+阅读 · 2018年2月16日

微信扫码咨询专知VIP会员