ACES: 用于评价机器翻译的翻译准确性成套挑战 (ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics) - 专知论文

会员服务 ·

0

模型评估 · Machine Translation · 机器翻译 · INFORMS · Performer ·

2022 年 12 月 6 日

ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

翻译：ACES: 用于评价机器翻译的翻译准确性成套挑战

Chantal Amrhein,Nikita Moghe,Liane Guillou

from arxiv, preprint for WMT 2022 with updated tables

As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.

翻译：随着机器翻译(MT)指标每年改善与人类判断的关联性,了解这些指标在部门层面的局限性至关重要。具体地说,在面临MT精确误差时,必须调查衡量行为,因为这些误差在某些情况下(例如法律、医学)可能产生危险后果。我们翻译ACES是一个翻译准确性挑战组,由68种现象组成,从文字/字形层次的简单扰动到基于讨论和现实世界知识的更复杂的错误。我们利用ACES来评估广泛的MT指标,包括提交WMT 2022衡量标准的共同任务,并进行若干分析,为衡量开发者提出一般性建议。我们建议:(a) 将衡量标准与不同强项结合起来,(b) 制定衡量标准,使源得到更多重视,减少与参考的表面重叠,(c) 明确模拟通过多语种粘合体提供的更多语言特定语言信息。

0

相关内容

模型评估

机器学习系统设计系统评估标准

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

金属碳化物基低铂介孔催化材料的合成、界面设计与电催化性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

RACK1调控Dishevelled蛋白和Wnt信号的分子机制与生理意义研究

国家自然科学基金

1+阅读 · 2015年12月31日

石墨烯负载纳米Co基催化剂的可控制备与催化性能

国家自然科学基金

0+阅读 · 2014年12月31日

臭氧光催化转化的基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

高效光纤型SERS探针的界面设计、可控制备及性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

镶嵌型Fe2O3/MCMB材料的原位合成和电极反应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于微纳结构调控表面等离激元的量子信息处理研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属晶粒长大动力学的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

调节性T细胞对脓毒症免疫抑制的影响—#8212;膜结合TGFβ#20381;赖促凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

Generating Synthetic Speech from SpokenVocab for Speech Translation

Arxiv

0+阅读 · 2023年2月8日

Video compression dataset and benchmark of learning-based video-quality metrics

Arxiv

0+阅读 · 2023年2月7日

Zero-shot Image-to-Image Translation

Arxiv

0+阅读 · 2023年2月6日

Evaluation Metrics for Measuring Bias in Search Engine Results

Arxiv

0+阅读 · 2023年2月3日

The Solvability of Interpretability Evaluation Metrics

Arxiv

0+阅读 · 2023年2月3日

The unreasonable effectiveness of few-shot learning for machine translation

Arxiv

0+阅读 · 2023年2月2日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

mvn2vec: Preservation and Collaboration in Multi-View Network Embedding

Arxiv

10+阅读 · 2018年1月19日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Generating Synthetic Speech from SpokenVocab for Speech Translation

Arxiv

0+阅读 · 2023年2月8日

Video compression dataset and benchmark of learning-based video-quality metrics

Arxiv

0+阅读 · 2023年2月7日

Zero-shot Image-to-Image Translation

Arxiv

0+阅读 · 2023年2月6日

Evaluation Metrics for Measuring Bias in Search Engine Results

Arxiv

0+阅读 · 2023年2月3日

The Solvability of Interpretability Evaluation Metrics

Arxiv

0+阅读 · 2023年2月3日

The unreasonable effectiveness of few-shot learning for machine translation

Arxiv

0+阅读 · 2023年2月2日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

mvn2vec: Preservation and Collaboration in Multi-View Network Embedding

Arxiv

10+阅读 · 2018年1月19日

相关基金

金属碳化物基低铂介孔催化材料的合成、界面设计与电催化性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

RACK1调控Dishevelled蛋白和Wnt信号的分子机制与生理意义研究

国家自然科学基金

1+阅读 · 2015年12月31日

石墨烯负载纳米Co基催化剂的可控制备与催化性能

国家自然科学基金

0+阅读 · 2014年12月31日

臭氧光催化转化的基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

高效光纤型SERS探针的界面设计、可控制备及性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

镶嵌型Fe2O3/MCMB材料的原位合成和电极反应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于微纳结构调控表面等离激元的量子信息处理研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属晶粒长大动力学的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

调节性T细胞对脓毒症免疫抑制的影响—#8212;膜结合TGFβ#20381;赖促凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员