寻求对MT进行人文评估的更高权力 (Searching for a higher power in the human evaluation of MT) - 专知论文

会员服务 ·

0

成对型 · 统计量 · 机器翻译 · 可辨认的 · 方差减小 ·

2022 年 10 月 20 日

Searching for a higher power in the human evaluation of MT

翻译：寻求对MT进行人文评估的更高权力

Johnny Tian-Zheng Wei,Tom Kocmi,Christian Federmann

from arxiv, WMT 2022

In MT evaluation, pairwise comparisons are conducted to identify the better system. In conducting the comparison, the experimenter must allocate a budget to collect Direct Assessment (DA) judgments. We provide a cost effective way to spend the budget, but show that typical budget sizes often do not allow for solid comparison. Taking the perspective that the basis of solid comparison is in achieving statistical significance, we study the power (rate of achieving significance) on a large collection of pairwise DA comparisons. Due to the nature of statistical estimation, power is low for differentiating less than 1-2 DA points, and to achieve a notable increase in power requires at least 2-3x more samples. Applying variance reduction alone will not yield these gains, so we must face the reality of undetectable differences and spending increases. In this context, we propose interim testing, an "early stopping" collection procedure that yields more power per judgment collected, which adaptively focuses the budget on pairs that are borderline significant. Interim testing can achieve up to a 27% efficiency gain when spending 3x the current budget, or 18% savings at the current evaluation power.

翻译：在MT评估中,进行对称比较是为了确定更好的系统。在进行比较时,实验者必须分配预算来收集直接评估(DA)的判断。我们提供了一种成本效益高的花费预算的方法,但表明典型的预算规模往往无法进行扎实的比较。从扎实的比较基础是达到统计意义的角度出发,我们研究大量收集对称的DA比较的功率(实现意义率),由于统计估计的性质,区分少于1-2 DA点的功率较低,并实现显著提高权力需要至少增加2-3x的样本。单靠减少差异不会产生这些收益,因此我们必须面对无法检测的差异和支出增加的现实。在这方面,我们提议临时测试,即“早期停止”收集程序,根据收集的判断产生更大的权力,这种“早期停止”程序将预算集中用于具有边际重要性的对子的预算。在支出当前预算时可以达到27%的效率收益,或者在目前的评估能力下节省18%。

0

相关内容

成对型

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

肺循环肿瘤细胞分子表型鉴定

国家自然科学基金

0+阅读 · 2014年12月31日

miR-155、miR-192调控Rho A/Rho GTPase信号通路对原发性开角型青光眼的干预

国家自然科学基金

0+阅读 · 2014年12月31日

NF-κB信号通路调控溶酶体相关4次跨膜蛋白质B (LAPTM4B)促人肝细胞癌增殖作用的研究

国家自然科学基金

0+阅读 · 2013年12月31日

电磁场特征值问题的间断 Galerkin 算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

血管内皮细胞自噬的剪切应力调控及其在As中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Find-me和Eat-me信号在NOD.H-2h4 小鼠自身免疫甲状腺炎发病机制中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

前列腺癌组织特异microRNA表达谱研究

国家自然科学基金

0+阅读 · 2008年12月31日

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Arxiv

0+阅读 · 2022年12月5日

Anomaly Detection in Power Markets and Systems

Arxiv

0+阅读 · 2022年12月5日

Automated data validation: an industrial experience report

Arxiv

0+阅读 · 2022年12月5日

Data-centric Reliability Evaluation of Individual Predictions

Arxiv

0+阅读 · 2022年12月1日

Experimental Observations of the Topology of Convolutional Neural Network Activations

Arxiv

0+阅读 · 2022年12月1日

Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models

Arxiv

0+阅读 · 2022年11月19日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Arxiv

0+阅读 · 2022年12月5日

Anomaly Detection in Power Markets and Systems

Arxiv

0+阅读 · 2022年12月5日

Automated data validation: an industrial experience report

Arxiv

0+阅读 · 2022年12月5日

Data-centric Reliability Evaluation of Individual Predictions

Arxiv

0+阅读 · 2022年12月1日

Experimental Observations of the Topology of Convolutional Neural Network Activations

Arxiv

0+阅读 · 2022年12月1日

Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models

Arxiv

0+阅读 · 2022年11月19日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

相关基金

肺循环肿瘤细胞分子表型鉴定

国家自然科学基金

0+阅读 · 2014年12月31日

miR-155、miR-192调控Rho A/Rho GTPase信号通路对原发性开角型青光眼的干预

国家自然科学基金

0+阅读 · 2014年12月31日

NF-κB信号通路调控溶酶体相关4次跨膜蛋白质B (LAPTM4B)促人肝细胞癌增殖作用的研究

国家自然科学基金

0+阅读 · 2013年12月31日

电磁场特征值问题的间断 Galerkin 算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

血管内皮细胞自噬的剪切应力调控及其在As中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Find-me和Eat-me信号在NOD.H-2h4 小鼠自身免疫甲状腺炎发病机制中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

前列腺癌组织特异microRNA表达谱研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员