MultiWOZ 是解决的任务吗? 与用户模拟器的交互式TOD评估框架 (Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator) - 专知论文

会员服务 ·

0

INTERACT · 任务对话系统 · INFORMS · 得分 · MoDELS ·

2022 年 10 月 26 日

Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator

翻译：MultiWOZ 是解决的任务吗? 与用户模拟器的交互式TOD评估框架

Qinyuan Cheng,Linyang Li,Guofeng Quan,Feng Gao,Xiaofeng Mou,Xipeng Qiu

from arxiv, Accepted by Findings of EMNLP 2022

Task-Oriented Dialogue (TOD) systems are drawing more and more attention in recent studies. Current methods focus on constructing pre-trained models or fine-tuning strategies while the evaluation of TOD is limited by a policy mismatch problem. That is, during evaluation, the user utterances are from the annotated dataset while these utterances should interact with previous responses which can have many alternatives besides annotated texts. Therefore, in this work, we propose an interactive evaluation framework for TOD. We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues. Besides, we introduce a sentence-level and a session-level score to measure the sentence fluency and session coherence in the interactive evaluation. Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates in the interactive evaluation of MultiWOZ dataset and the proposed scores measure the response quality besides the inform and success rates. We are hoping that our work will encourage simulator-based interactive evaluations in the TOD task.

翻译：以任务为主的对话(TOD)系统正在在最近的研究中引起越来越多的关注。目前的方法侧重于建立预先培训的模式或微调战略,而对于TOD的评价则受到政策不匹配问题的限制。也就是说,在评价期间,用户的发声来自附加说明的数据集,而这些发声应当与先前的答复相互作用,这些答复除了附加说明的文本之外,还可以有许多其他选择。因此,在这项工作中,我们建议为TOD建立一个互动评价框架。我们首先根据预先培训的模式建立一个面向目标的用户模拟器,然后利用用户模拟器与对话系统互动,以产生对话。此外,我们引入一个判刑等级和届会等级的评分,以衡量互动式评价中的判刑流畅度和会议的一致性。实验结果表明,由我们拟议用户模拟器培训的基于RL的TOD系统在互动式评价多 WOZ数据集方面可以达到近98%的知情率和成功率,而拟议的分数衡量除了知情率和成功率之外还衡量反应质量。我们希望我们的工作将鼓励模拟以模拟为基础的互动式评价任务中的交互式评价。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

气道上皮细胞RUNX1调控急性肺损伤肺部炎症机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于MIF/CD74通路探讨肺泡巨噬细胞极化在肝移植围术期急性肺损伤中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于时空域模型分解策略的流程企业级协同优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Nodal-ALK7介导的β细胞内源性调节对β细胞功能的影响及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

镰形棘豆总黄酮干预炎症介导的胰岛素抵抗分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

大麻素WIN靶向PPARγ22522;因抗肝细胞癌增殖及其信号转导通路研究

国家自然科学基金

0+阅读 · 2011年12月31日

花色苷调节内皮细胞氧化胆固醇外流及其对血管内皮功能紊乱的防治

国家自然科学基金

0+阅读 · 2011年12月31日

硫化氢对血管内皮细胞衰老的调节作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

风电系统电压稳定性分析及系统中的分岔现象

国家自然科学基金

0+阅读 · 2008年12月31日

Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images

Arxiv

0+阅读 · 2022年12月13日

Drivers of the decrease of patent similarities from 1976 to 2021

Arxiv

0+阅读 · 2022年12月12日

Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks

Arxiv

0+阅读 · 2022年12月12日

Decisions that Explain Themselves: A User-Centric Deep Reinforcement Learning Explanation System

Arxiv

0+阅读 · 2022年12月11日

AliCHI: A Large-scale Multi-modal Dataset and Automated Evaluation Tool for Human-like Dialogue Systems

Arxiv

0+阅读 · 2022年12月11日

Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation

Arxiv

0+阅读 · 2022年12月8日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

A Deep Reinforcement Learning Chatbot (Short Version)

Arxiv

13+阅读 · 2018年1月20日

VIP会员

文章信息

相关主题

任务对话系统

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

相关论文

Solving Sample-Level Out-of-Distribution Detection on 3D Medical Images

Arxiv

0+阅读 · 2022年12月13日

Drivers of the decrease of patent similarities from 1976 to 2021

Arxiv

0+阅读 · 2022年12月12日

Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks

Arxiv

0+阅读 · 2022年12月12日

Decisions that Explain Themselves: A User-Centric Deep Reinforcement Learning Explanation System

Arxiv

0+阅读 · 2022年12月11日

AliCHI: A Large-scale Multi-modal Dataset and Automated Evaluation Tool for Human-like Dialogue Systems

Arxiv

0+阅读 · 2022年12月11日

Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation

Arxiv

0+阅读 · 2022年12月8日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

A Deep Reinforcement Learning Chatbot (Short Version)

Arxiv

13+阅读 · 2018年1月20日

相关基金

气道上皮细胞RUNX1调控急性肺损伤肺部炎症机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于MIF/CD74通路探讨肺泡巨噬细胞极化在肝移植围术期急性肺损伤中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于时空域模型分解策略的流程企业级协同优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Nodal-ALK7介导的β细胞内源性调节对β细胞功能的影响及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

镰形棘豆总黄酮干预炎症介导的胰岛素抵抗分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

大麻素WIN靶向PPARγ22522;因抗肝细胞癌增殖及其信号转导通路研究

国家自然科学基金

0+阅读 · 2011年12月31日

花色苷调节内皮细胞氧化胆固醇外流及其对血管内皮功能紊乱的防治

国家自然科学基金

0+阅读 · 2011年12月31日

硫化氢对血管内皮细胞衰老的调节作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

风电系统电压稳定性分析及系统中的分岔现象

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员