Co:评价对话国家跟踪器的可控反事实 (CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers) - 专知论文

会员服务 ·

0

任务对话系统 · DST (Digital Sky Technologies) · 模型评估 · 控制器 · MoDELS ·

2021 年 1 月 1 日

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

翻译：Co:评价对话国家跟踪器的可控反事实

Shiyang Li,Semih Yavuz,Kazuma Hashimoto,Jia Li,Tong Niu,Nazneen Rajani,Xifeng Yan,Yingbo Zhou,Caiming Xiong

Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood. We propose controllable counterfactuals (CoCo) to bridge this gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the user responded differently but still consistently with the dialogue flow? CoCo leverages turn-level belief states as counterfactual conditionals to produce novel conversation scenarios in two steps: (i) counterfactual goal generation at turn-level by dropping and adding slots followed by replacing slot values, (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow. Evaluating state-of-the-art DST models on MultiWOZ dataset with CoCo-generated counterfactuals results in a significant performance drop of up to 30.8% (from 49.4% to 18.6%) in absolute joint goal accuracy. In comparison, widely used techniques like paraphrasing only affect the accuracy by at most 2%. Human evaluations show that COCO-generated conversations perfectly reflect the underlying user goal with more than 95% accuracy and are as human-like as the original conversations, further strengthening its reliability and promise to be adopted as part of the robustness evaluation of DST models. Code is available at https://github.com/salesforce/coco-dst.

翻译：州级对话跟踪者在基准数据集方面取得了显著进展,但是他们对于超脱对话之外新颖和现实情景的概括化能力却不那么为人所理解。我们提议控制反事实(CoCo),以弥合这一差距,并评价关于新情景的对话状态跟踪模式,即如果用户反应不同,但仍与对话流一致,系统能否成功应对请求? COo利用翻转层面的信念,将反事实性条件分为两个步骤,产生新的对话情景:(一) 在转折层面,通过降低和增加空档,继而取代空档值,从而实现反现实目标。 (二) 反事实性对话生成,以(一)为条件,并与对话流相一致。评估多WoZ数据集中最先进的DST模型,如果与Coco生成的反事实相适应,系统是否会成功满足30.8%(从49.4%到18.6%)的绝对联合目标准确性。相比之下,广泛使用的技术,例如对调频位,仅影响精度,然后取代空位值的准确度值,(二分点), 人类评估将更准确地反映C- CO- 的准确性对话作为原始的准确性,作为原始代码基础,作为核心的准确性,作为基础,可以进一步反映。

1

相关内容

任务对话系统

任务对话系统

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【经典书】信息系统导论，626页pdf，Introduction to Information System

【经典书】信息系统导论，626页pdf，Introduction to Information System

专知会员服务

35+阅读 · 2020年7月10日

【WWW2020-北京大学】多模态多轮对话系统，Multi-Modality in Multi-Turn Dialog

【WWW2020-北京大学】多模态多轮对话系统，Multi-Modality in Multi-Turn Dialog

专知会员服务

58+阅读 · 2020年3月13日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

【NLP Seminar】机器人控制和协作下的位置指令（Robot Control and Collaboration in Situated Instruction Following ），康奈尔大学助理教授| Yoav Artzi

【NLP Seminar】机器人控制和协作下的位置指令（Robot Control and Collaboration in Situated Instruction Following ），康奈尔大学助理教授| Yoav Artzi

专知会员服务

8+阅读 · 2019年12月5日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇聊天机器人（Chatbot）相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

【论文推荐】最新5篇聊天机器人（Chatbot）相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

专知

23+阅读 · 2018年1月30日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

多轮对话之对话管理：Dialog Management

多轮对话之对话管理：Dialog Management

PaperWeekly

18+阅读 · 2018年1月15日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

A Weakly-Supervised Semantic Segmentation Approach based on the Centroid Loss: Application to Quality Control and Inspection

Arxiv

0+阅读 · 2021年3月4日

Text Generation by Learning from Demonstrations

Arxiv

0+阅读 · 2021年3月3日

CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts

Arxiv

0+阅读 · 2021年2月28日

Evaluating the Progress of Deep Learning for Visual Relational Concepts

Arxiv

0+阅读 · 2021年2月27日

Improving Longer-range Dialogue State Tracking

Arxiv

0+阅读 · 2021年2月27日

Variation Control and Evaluation for Generative SlateRecommendations

Arxiv

0+阅读 · 2021年2月26日

Learning Chess Blindfolded: Evaluating Language Models on State Tracking

Arxiv

0+阅读 · 2021年2月26日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Arxiv

3+阅读 · 2020年10月20日

A Framework for Evaluating 6-DOF Object Trackers

Arxiv

6+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

任务对话系统

DST (Digital Sky Technologies)

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【经典书】信息系统导论，626页pdf，Introduction to Information System

【经典书】信息系统导论，626页pdf，Introduction to Information System

专知会员服务

35+阅读 · 2020年7月10日

【WWW2020-北京大学】多模态多轮对话系统，Multi-Modality in Multi-Turn Dialog

【WWW2020-北京大学】多模态多轮对话系统，Multi-Modality in Multi-Turn Dialog

专知会员服务

58+阅读 · 2020年3月13日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

【NLP Seminar】机器人控制和协作下的位置指令（Robot Control and Collaboration in Situated Instruction Following ），康奈尔大学助理教授| Yoav Artzi

【NLP Seminar】机器人控制和协作下的位置指令（Robot Control and Collaboration in Situated Instruction Following ），康奈尔大学助理教授| Yoav Artzi

专知会员服务

8+阅读 · 2019年12月5日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新5篇聊天机器人（Chatbot）相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

【论文推荐】最新5篇聊天机器人（Chatbot）相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

专知

23+阅读 · 2018年1月30日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

多轮对话之对话管理：Dialog Management

多轮对话之对话管理：Dialog Management

PaperWeekly

18+阅读 · 2018年1月15日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

A Weakly-Supervised Semantic Segmentation Approach based on the Centroid Loss: Application to Quality Control and Inspection

Arxiv

0+阅读 · 2021年3月4日

Text Generation by Learning from Demonstrations

Arxiv

0+阅读 · 2021年3月3日

CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts

Arxiv

0+阅读 · 2021年2月28日

Evaluating the Progress of Deep Learning for Visual Relational Concepts

Arxiv

0+阅读 · 2021年2月27日

Improving Longer-range Dialogue State Tracking

Arxiv

0+阅读 · 2021年2月27日

Variation Control and Evaluation for Generative SlateRecommendations

Arxiv

0+阅读 · 2021年2月26日

Learning Chess Blindfolded: Evaluating Language Models on State Tracking

Arxiv

0+阅读 · 2021年2月26日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Arxiv

3+阅读 · 2020年10月20日

A Framework for Evaluating 6-DOF Object Trackers

Arxiv

6+阅读 · 2018年3月28日

微信扫码咨询专知VIP会员