同步:统一转折点和对话层面评价 (DynaEval: Unifying Turn and Dialogue Level Evaluation) - 专知论文

会员服务 ·

0

任务对话系统 · 图卷积神经网络/图卷积网络 · INTERACT · 图 · Performer ·

2021 年 6 月 3 日

DynaEval: Unifying Turn and Dialogue Level Evaluation

翻译：同步:统一转折点和对话层面评价

Chen Zhang,Yiming Chen,Luis Fernando D'Haro,Yan Zhang,Thomas Friedrichs,Grandee Lee,Haizhou Li

from arxiv, ACL-IJCNLP 2021 (Main conference, Long paper)

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN) is adopted to model a dialogue in totality, where the graph nodes denote each individual utterance and the edges represent the dependency between pairs of utterances. A contrastive loss is then applied to distinguish well-formed dialogues from carefully constructed negative samples. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model, and correlates strongly with human judgements across multiple dialogue evaluation aspects at both turn and dialogue level.

翻译：有效的评价指标应该反映这种互动的动态。现有的自动衡量指标非常侧重于转折点质量,而忽略了这种动态。为此,我们提议DynaEval,这是一个统一的自动评价框架,不仅能够进行转折点评价,而且能够通盘考虑整个对话的质量。在DynaEval,图形共变网络(GCN)被采用来模拟全面的对话,其中图形节点代表了每个个人的话语和边缘代表了两种话语之间的依赖性。随后将对比性损失用于区分完善的对话和精心构建的负面样本。实验显示,DynaEval大大超越了最先进的对话一致性模式,与人对交替和对话层面多个对话评价方面的判断密切相关。

0

相关内容

任务对话系统

任务对话系统

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年2月24日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

CVPR2019年热门论文及开源代码分享

CVPR2019年热门论文及开源代码分享

深度学习与NLP

7+阅读 · 2019年6月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

【TED】什么让我们生病

【TED】什么让我们生病

英语演讲视频每日一推

7+阅读 · 2019年1月23日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods

Arxiv

0+阅读 · 2021年7月26日

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

Arxiv

0+阅读 · 2021年7月26日

SVEva Fair: A Framework for Evaluating Fairness in Speaker Verification

Arxiv

0+阅读 · 2021年7月26日

Generative Pretraining for Paraphrase Evaluation

Arxiv

0+阅读 · 2021年7月24日

MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation

Arxiv

0+阅读 · 2021年7月24日

Investigating and Mitigating Degree-Related Biases in Graph Convolutional Networks

Arxiv

6+阅读 · 2020年8月13日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Arxiv

3+阅读 · 2018年2月13日

VIP会员

文章信息

相关主题

任务对话系统

图卷积神经网络/图卷积网络

相关VIP内容

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年2月24日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

CVPR2019年热门论文及开源代码分享

CVPR2019年热门论文及开源代码分享

深度学习与NLP

7+阅读 · 2019年6月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

【TED】什么让我们生病

【TED】什么让我们生病

英语演讲视频每日一推

7+阅读 · 2019年1月23日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

相关论文

A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods

Arxiv

0+阅读 · 2021年7月26日

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

Arxiv

0+阅读 · 2021年7月26日

SVEva Fair: A Framework for Evaluating Fairness in Speaker Verification

Arxiv

0+阅读 · 2021年7月26日

Generative Pretraining for Paraphrase Evaluation

Arxiv

0+阅读 · 2021年7月24日

MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation

Arxiv

0+阅读 · 2021年7月24日

Investigating and Mitigating Degree-Related Biases in Graph Convolutional Networks

Arxiv

6+阅读 · 2020年8月13日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Arxiv

3+阅读 · 2018年2月13日

微信扫码咨询专知VIP会员