差异评价中最低对等界限问题 (On the Limits of Minimal Pairs in Contrastive Evaluation) - 专知论文

会员服务 ·

0

contrastive · MoDELS · 假阳性 · 测试数据 · 近似 ·

2021 年 9 月 15 日

On the Limits of Minimal Pairs in Contrastive Evaluation

翻译：差异评价中最低对等界限问题

Jannis Vamvas,Rico Sennrich

from arxiv, BlackboxNLP 2021

Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold: First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to human-written references. We present a contrastive evaluation suite for English-German MT that implements this recommendation.

翻译：语言模型行为分析经常使用最小的句子配对来分析语言模型的行为。人们常常认为,对比对的模型行为是整个模型行为的预测。我们争辩说,这一假设必须有两个条件:第一,测试的假设应该具有良好的动机,因为实验表明对比评价可能导致假正数。第二,应选择测试数据,以尽可能减少评价时间和部署时间之间的分布差异。对于部署时间解码的近似,我们建议,最小的配对是根据机器生成的文本创建的,而不是根据人写的参考书。我们为执行这项建议的英德MT提供了一个对比式的评估套件。

0

相关内容

contrastive

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

数据标注研究综述，软件学报，19页pdf

数据标注研究综述，软件学报，19页pdf

专知会员服务

95+阅读 · 2020年2月20日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

对比学习（Contrastive Learning）相关进展梳理

对比学习（Contrastive Learning）相关进展梳理

PaperWeekly

11+阅读 · 2020年5月12日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

From inexact optimization to learning via gradient concentration

Arxiv

0+阅读 · 2021年11月5日

Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee

Arxiv

0+阅读 · 2021年11月5日

Learning to Manipulate Tools by Aligning Simulation to Video Demonstration

Arxiv

0+阅读 · 2021年11月4日

The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates

The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates

Arxiv

0+阅读 · 2021年11月4日

Linear-time Minimization of Wheeler DFAs

Arxiv

0+阅读 · 2021年11月3日

Contrastive Active Inference

Arxiv

4+阅读 · 2021年10月19日

Cross-domain Imitation from Observations

Arxiv

8+阅读 · 2021年5月20日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

VIP会员

文章信息

相关主题

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

数据标注研究综述，软件学报，19页pdf

数据标注研究综述，软件学报，19页pdf

专知会员服务

95+阅读 · 2020年2月20日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

对比学习（Contrastive Learning）相关进展梳理

对比学习（Contrastive Learning）相关进展梳理

PaperWeekly

11+阅读 · 2020年5月12日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

From inexact optimization to learning via gradient concentration

Arxiv

0+阅读 · 2021年11月5日

Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee

Arxiv

0+阅读 · 2021年11月5日

Learning to Manipulate Tools by Aligning Simulation to Video Demonstration

Arxiv

0+阅读 · 2021年11月4日

The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates

The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates

Arxiv

0+阅读 · 2021年11月4日

Linear-time Minimization of Wheeler DFAs

Arxiv

0+阅读 · 2021年11月3日

Contrastive Active Inference

Arxiv

4+阅读 · 2021年10月19日

Cross-domain Imitation from Observations

Arxiv

8+阅读 · 2021年5月20日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Arxiv

10+阅读 · 2018年4月11日

微信扫码咨询专知VIP会员