WeaSuL: 薄弱的受监督的对话政策学习:多方向对话的回报估计 (WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue) - 专知论文

会员服务 ·

0

任务对话系统 · 估计/估计量 · 监督 · INTERACT · MoDELS ·

2021 年 8 月 4 日

WeaSuL: Weakly Supervised Dialogue Policy Learning: Reward Estimation for Multi-turn Dialogue

翻译：WeaSuL: 薄弱的受监督的对话政策学习:多方向对话的回报估计

Anant Khandelwal

from arxiv, 12 pages, 1 figures, 2 tables, Accepted at DialDoc'21 at ACL-IJCNLP 2021

An intelligent dialogue system in a multi-turn setting should not only generate the responses which are of good quality, but it should also generate the responses which can lead to long-term success of the dialogue. Although, the current approaches improved the response quality, but they over-look the training signals present in the dialogue data. We can leverage these signals to generate the weakly supervised training data for learning dialog policy and reward estimator, and make the policy take actions (generates responses) which can foresee the future direction for a successful (rewarding) conversation. We simulate the dialogue between an agent and a user (modelled similar to an agent with supervised learning objective) to interact with each other. The agent uses dynamic blocking to generate ranked diverse responses and exploration-exploitation to select among the Top-K responses. Each simulated state-action pair is evaluated (works as a weak annotation) with three quality modules: Semantic Relevant, Semantic Coherence and Consistent Flow. Empirical studies with two benchmarks indicate that our model can significantly out-perform the response quality and lead to a successful conversation on both automatic evaluation and human judgement.

翻译：在多转环境中的智能对话系统不仅应当产生高质量的反应,而且还应当产生能够导致对话长期成功的反应。虽然目前的方法提高了反应质量,但是它们过于看重对话数据中存在的培训信号。我们可以利用这些信号来生成监督不力的培训数据,用于学习对话政策和奖赏估测器,并使政策采取能够预见成功(奖励)对话未来方向的行动(基因反应)。我们模拟代理人与用户(类似于有监督学习目标的代理人)之间的对话,以便彼此互动。代理利用动态阻塞来生成排位不同的反应和探索探索探索,以便在顶级反应中进行选择。对每对模拟状态行动配对的评价(工作作为微弱的注解)有三个质量模块:语义相关、语义一致性和一致性流动。有两个基准的“经验性”研究表明,我们的模型可以大大超出反应质量,并导致关于自动评价和人判断的成功对话。

0

相关内容

任务对话系统

任务对话系统

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

专知会员服务

37+阅读 · 2020年4月10日

【ACL2019】基于学习注意力机制的知识图谱中关系预测的嵌入 Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

【ACL2019】基于学习注意力机制的知识图谱中关系预测的嵌入 Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

专知会员服务

122+阅读 · 2020年3月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

专知会员服务

60+阅读 · 2019年11月16日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

跟踪SLAM前沿动态系列之ICCV2019

跟踪SLAM前沿动态系列之ICCV2019

泡泡机器人SLAM

7+阅读 · 2019年11月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

ACL 2018 计算语言学协会接受论文列表

ACL 2018 计算语言学协会接受论文列表

专知

3+阅读 · 2018年4月27日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images

Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images

Arxiv

5+阅读 · 2019年3月8日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation

Arxiv

5+阅读 · 2018年10月3日

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Arxiv

5+阅读 · 2018年7月11日

Multi-turn Dialogue Response Generation in an Adversarial Learning Framework

Arxiv

4+阅读 · 2018年6月11日

Hierarchical Reinforcement Learning with Deep Nested Agents

Arxiv

3+阅读 · 2018年5月18日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

Self Paced Deep Learning for Weakly Supervised Object Detection

Arxiv

8+阅读 · 2018年2月21日

VIP会员

文章信息

相关主题

任务对话系统

估计/估计量

相关VIP内容

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

62+阅读 · 2020年8月6日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

专知会员服务

37+阅读 · 2020年4月10日

【ACL2019】基于学习注意力机制的知识图谱中关系预测的嵌入 Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

【ACL2019】基于学习注意力机制的知识图谱中关系预测的嵌入 Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

专知会员服务

122+阅读 · 2020年3月29日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

【超分辨率| 2019最新综述】图像超分辨率的深度学习，附PDF（Deep Learning for Image Super-resolution: A Survey）

专知会员服务

60+阅读 · 2019年11月16日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

用于无人机的C波段空地通信系统研究 | 2025最新116页

甚高频军事战术通信系统传播性能分析研究

军事通信系统：安全行动的支柱

卫星与地面通信系统：美陆军面临的空间与电子战局势 | 39页报告

相关资讯

跟踪SLAM前沿动态系列之ICCV2019

跟踪SLAM前沿动态系列之ICCV2019

泡泡机器人SLAM

7+阅读 · 2019年11月23日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

ACL 2018 计算语言学协会接受论文列表

ACL 2018 计算语言学协会接受论文列表

专知

3+阅读 · 2018年4月27日

相关论文

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Arxiv

3+阅读 · 2020年5月13日

Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images

Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images

Arxiv

5+阅读 · 2019年3月8日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation

Arxiv

5+阅读 · 2018年10月3日

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Arxiv

5+阅读 · 2018年7月11日

Multi-turn Dialogue Response Generation in an Adversarial Learning Framework

Arxiv

4+阅读 · 2018年6月11日

Hierarchical Reinforcement Learning with Deep Nested Agents

Arxiv

3+阅读 · 2018年5月18日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

Self Paced Deep Learning for Weakly Supervised Object Detection

Arxiv

8+阅读 · 2018年2月21日

微信扫码咨询专知VIP会员