以集群为基础的自动生成文本评价 (Cluster-based Evaluation of Automatically Generated Text) - 专知论文

会员服务 ·

0

簇 · 估计/估计量 · 相关系数 · 可约的 · 语言模型化 ·

2022 年 5 月 31 日

Cluster-based Evaluation of Automatically Generated Text

翻译：以集群为基础的自动生成文本评价

Tiago Pimentel,Clara Meister,Ryan Cotterell

from arxiv, Tiago Pimentel and Clara Meister contributed equally to this work

While probabilistic language generators have improved dramatically over the last few years, the automatic evaluation metrics used to assess them have not kept pace with this progress. In the domain of language generation, a good metric must correlate highly with human judgements. Yet, with few exceptions, there is a lack of such metrics in the literature. In this work, we analyse the general paradigm of language generator evaluation. We first discuss the computational and qualitative issues with using automatic evaluation metrics that operate on probability distributions over strings, the backbone of most language generators. We then propose the use of distributions over clusters instead, where we cluster strings based on their text embeddings (obtained from a pretrained language model). While we find the biases introduced by this substitution to be quite strong, we observe that, empirically, this methodology leads to metric estimators with higher correlation with human judgements, while simultaneously reducing estimator variance. We finish the paper with a probing analysis, which leads us to conclude that -- by encoding syntactic- and coherence-level features of text, while ignoring surface-level features -- these clusters may simply be better equipped to evaluate state-of-the-art language models.

翻译：虽然过去几年来概率语言生成器有了显著改善,但用于评估这些生成器的自动评价指标却跟不上这一进展。在语言生成领域,一个良好的衡量标准必须与人类判断高度相关。然而,除了少数例外,文献中缺乏这样的衡量标准。在这项工作中,我们分析语言生成器评价的一般模式。我们首先通过使用自动评价指标来讨论计算和质量问题,该评价指标以大多数语言生成器的骨干 -- -- 字符串之间的概率分布方式运作。我们然后提议使用在组群之间的分布,而我们则在组群的文本嵌嵌入(由预先培训的语言模型组成)的基础上将字符分组。我们虽然发现这种替换带来的偏差相当强烈,但从经验上看,我们发现这一方法导致与人类判断器的关联性更高,同时缩小了估计器的差异。我们先用一个预测性分析来完成文件,从而得出这样的结论:通过编码综合和一致性水平的文本特征,同时忽略地表水平的特征,这些组群群群群群可能更有能力评估状态语言模型。

0

相关内容

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

长非编码RNA在Her2阳性乳腺癌中的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于仿生磁性石墨烯构筑球形DNA扩增的免疫传感器检测微囊藻毒素的研究

国家自然科学基金

0+阅读 · 2014年12月31日

各向异性银纳米粒子自组装的纤维结构生色及其光谱特性与调控

国家自然科学基金

0+阅读 · 2012年12月31日

ERR-alpha 小分子激动剂及其对糖脂代谢调控的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属咔咯配合物的催化反应

国家自然科学基金

0+阅读 · 2012年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

金属表面等离激元Fano共振效应及其应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

人细胞型朊病毒基因PRNP的表达调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Back to the Manifold: Recovering from Out-of-Distribution States

Back to the Manifold: Recovering from Out-of-Distribution States

Arxiv

0+阅读 · 2022年7月18日

A Simple Test-Time Method for Out-of-Distribution Detection

Arxiv

0+阅读 · 2022年7月17日

An Empirical Study of Automated Unit Test Generation for Python

Arxiv

0+阅读 · 2022年7月17日

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Arxiv

0+阅读 · 2022年7月16日

On the Usefulness of Deep Ensemble Diversity for Out-of-Distribution Detection

Arxiv

0+阅读 · 2022年7月15日

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Arxiv

0+阅读 · 2022年7月14日

Automatic Quantization for Physics-Based Simulation

Arxiv

0+阅读 · 2022年7月14日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

估计/估计量

语言模型化

相关VIP内容

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Back to the Manifold: Recovering from Out-of-Distribution States

Back to the Manifold: Recovering from Out-of-Distribution States

Arxiv

0+阅读 · 2022年7月18日

A Simple Test-Time Method for Out-of-Distribution Detection

Arxiv

0+阅读 · 2022年7月17日

An Empirical Study of Automated Unit Test Generation for Python

Arxiv

0+阅读 · 2022年7月17日

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Arxiv

0+阅读 · 2022年7月16日

On the Usefulness of Deep Ensemble Diversity for Out-of-Distribution Detection

Arxiv

0+阅读 · 2022年7月15日

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Arxiv

0+阅读 · 2022年7月14日

Automatic Quantization for Physics-Based Simulation

Arxiv

0+阅读 · 2022年7月14日

A Survey of Knowledge-Enhanced Text Generation

Arxiv

18+阅读 · 2020年10月9日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

长非编码RNA在Her2阳性乳腺癌中的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

基于仿生磁性石墨烯构筑球形DNA扩增的免疫传感器检测微囊藻毒素的研究

国家自然科学基金

0+阅读 · 2014年12月31日

各向异性银纳米粒子自组装的纤维结构生色及其光谱特性与调控

国家自然科学基金

0+阅读 · 2012年12月31日

ERR-alpha 小分子激动剂及其对糖脂代谢调控的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属咔咯配合物的催化反应

国家自然科学基金

0+阅读 · 2012年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

金属表面等离激元Fano共振效应及其应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

人细胞型朊病毒基因PRNP的表达调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员