DiscoScore:评估具有BERT和DiscoScore一致性的文本生成 (DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence) - 专知论文

会员服务 ·

0

相关系数 · BERT · MoDELS · Machine Translation · state-of-the-art ·

2022 年 1 月 26 日

DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

翻译：DiscoScore:评估具有BERT和DiscoScore一致性的文本生成

Wei Zhao,Michael Strube,Steffen Eger

Recently has there been a growing interest in the creation of text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics cannot recognize coherence and fail to punish incoherent elements in system outputs. In this work, we introduce DiscoScore, a discourse metric with multiple variants, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level -- which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at \url{https://github.com/AIPHES/DiscoScore}.

翻译：最近人们越来越关心从讨论一致性的角度创建文本生成系统,例如模拟判决之间的相互依存性。不过,最近基于BERT的评价指标不能承认一致性,不能惩罚系统产出中的不一致元素。在这项工作中,我们引入了DiscoScore,这是一个包含多种变体的谈话标准,它利用BERT从不同的角度模拟对话的一致性,由中心理论驱动。我们的实验包括16个非讨论和讨论指标,包括DiscoScore和大众一致性模型,对汇总和文件级机器翻译(MT)进行评估。我们发现,(一) 以BERT为基础的大多数指标与十年前发明的早期讨论指标相比,与人类评级的一致性程度大得多;(二) 最近的“DiscoScore”,当系统一级运行时,这种状态较弱,因为系统通常以这种方式进行比较,这特别成问题。与人类评级的系统级相关性非常密切,不仅在一致性方面,而且在其他方面,而且超越了BARST/Ricolority,我们在10个相关版本中超越了DARS的比标度,我们以一个相对性解释了另一个相对性原则。

0

相关内容

相关系数

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

Notch通路相关基因甲基化对牙发育的调控机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

隔壁塔与反应精馏耦合过程的设计与优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

水资源管理云计算任务调度算法及优化策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

一类拟线性Schrodinger方程(组)解的存在性和集中现象研究

国家自然科学基金

0+阅读 · 2012年12月31日

集群环境下复杂结构非线性动力有限元并行求解算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

压缩感知框架下多视光学遥感影像超分辨率重建方法

国家自然科学基金

0+阅读 · 2011年12月31日

仿人多指手的多维指尖力感知和同步控制

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

A Corpus for Understanding and Generating Moral Stories

A Corpus for Understanding and Generating Moral Stories

Arxiv

1+阅读 · 2022年4月20日

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Arxiv

0+阅读 · 2022年4月20日

Where Was COVID-19 First Discovered? Designing a Question-Answering System for Pandemic Situations

Arxiv

0+阅读 · 2022年4月19日

Contrastive Demonstration Tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2022年4月18日

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Arxiv

0+阅读 · 2022年4月16日

Evaluation Benchmarks for Spanish Sentence Representations

Arxiv

0+阅读 · 2022年4月15日

Cross-Modal Coherence for Text-to-Image Retrieval

Arxiv

0+阅读 · 2022年4月15日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

Machine Translation

state-of-the-art

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

相关论文

A Corpus for Understanding and Generating Moral Stories

A Corpus for Understanding and Generating Moral Stories

Arxiv

1+阅读 · 2022年4月20日

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Arxiv

0+阅读 · 2022年4月20日

Where Was COVID-19 First Discovered? Designing a Question-Answering System for Pandemic Situations

Arxiv

0+阅读 · 2022年4月19日

Contrastive Demonstration Tuning for Pre-trained Language Models

Arxiv

0+阅读 · 2022年4月18日

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Arxiv

0+阅读 · 2022年4月16日

Evaluation Benchmarks for Spanish Sentence Representations

Arxiv

0+阅读 · 2022年4月15日

Cross-Modal Coherence for Text-to-Image Retrieval

Arxiv

0+阅读 · 2022年4月15日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

Notch通路相关基因甲基化对牙发育的调控机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

隔壁塔与反应精馏耦合过程的设计与优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

水资源管理云计算任务调度算法及优化策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

一类拟线性Schrodinger方程(组)解的存在性和集中现象研究

国家自然科学基金

0+阅读 · 2012年12月31日

集群环境下复杂结构非线性动力有限元并行求解算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

压缩感知框架下多视光学遥感影像超分辨率重建方法

国家自然科学基金

0+阅读 · 2011年12月31日

仿人多指手的多维指尖力感知和同步控制

国家自然科学基金

0+阅读 · 2011年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员