DivEMT: 神经机器翻译 (DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages) - 专知论文

会员服务 ·

0

NMT · Machine Translation · 多样性 · 情景 · 控制器 ·

2022 年 10 月 18 日

DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages

翻译：DivEMT: 神经机器翻译

Gabriele Sarti,Arianna Bisazza,Ana Guerberof Arenas,Antonio Toral

from arxiv, EMNLP 2022, materials: https://github.com/gsarti/divemt

We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keystrokes, editing times and pauses were recorded, enabling an in-depth, cross-lingual evaluation of NMT quality and post-editing effectiveness. Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity. We find that post-editing is consistently faster than translation from scratch. However, the magnitude of productivity gains varies widely across systems and languages, highlighting major disparities in post-editing effectiveness for languages at different degrees of typological relatedness to English, even when controlling for system architecture and training data size. We publicly release the complete dataset including all collected behavioral data, to foster new research on the translation capabilities of NMT systems for typologically diverse languages.

翻译：我们引入了DivEMT, 这是针对一组类型多样的目标语言的首个公开的神经机器翻译(NMT)编辑后编辑研究。我们使用严格控制的设置, 18个专业翻译被指示将同一套英文文件翻译或编辑后译成阿拉伯文、荷兰文、意大利文、土耳其文、乌克兰文和越南文。在此期间, 他们的编辑、键盘、编辑时间和暂停记录了记录, 使得能够对国家神经机器翻译质量和编辑后效果进行深入、跨语种评估。我们使用这一新数据集, 评估两种最先进的NMT系统( Google Translate 和多语种的 mBART-50 模型)对翻译生产率的影响。我们发现, 编辑后这一套英文文件总是比从零开始翻译的速度要快。然而, 各个系统和语言在编辑后提高生产率的程度差异很大, 突出与英文不同程度的语文在编辑后的有效性方面存在重大差异, 即使在控制系统结构和培训数据大小时。我们公开发布完整的数据集, 包括所有收集的行为数据数据数据,, 以促进对NMT系统不同类型翻译能力进行新的研究。

0

相关内容

NMT

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

长链非编码RNA SATB2-AS1调控SATB2表达介导结直肠癌转移的功能及分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-146b负调控NF-κB信号通路抑制异常应力下椎间盘退变的机制

国家自然科学基金

0+阅读 · 2013年12月31日

COP1通过p53—Brn-3a调控Bcl-2的表达在CLL发病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CAP70经调节PTEN磷酸化影响肾癌恶性表型的研究

国家自然科学基金

0+阅读 · 2012年12月31日

SREBP1转录因子在奶牛乳腺MAC-T细胞中对SCD基因启动子的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高迁移率族蛋白1通过上调肝癌病人kupffer细胞Toll样受体和IL-33表达来促进Th17细胞的功能

国家自然科学基金

0+阅读 · 2012年12月31日

钙离子通道在子宫内膜癌细胞上皮间质转化中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

SATB2在结直肠癌转移中的作用及相关信号网络

国家自然科学基金

0+阅读 · 2009年12月31日

氟西汀对抑郁模型大鼠S100B介导的信号通路ERK-NFkB的影响

国家自然科学基金

0+阅读 · 2009年12月31日

ROS、PGs、NO和TNF-α22312;LPS调控肝细胞LXR-α21450;其靶基因中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

TyDiP: A Dataset for Politeness Classification in Nine Typologically Diverse Languages

Arxiv

0+阅读 · 2022年11月29日

Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

Arxiv

0+阅读 · 2022年11月29日

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

Arxiv

0+阅读 · 2022年11月28日

Local Explanations for Reinforcement Learning

Arxiv

0+阅读 · 2022年11月28日

Establishment of Neural Networks Robust to Label Noise

Arxiv

0+阅读 · 2022年11月28日

LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation

Arxiv

0+阅读 · 2022年11月28日

Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality?

Arxiv

0+阅读 · 2022年11月25日

PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

Arxiv

0+阅读 · 2022年11月24日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

TyDiP: A Dataset for Politeness Classification in Nine Typologically Diverse Languages

Arxiv

0+阅读 · 2022年11月29日

Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

Arxiv

0+阅读 · 2022年11月29日

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

Arxiv

0+阅读 · 2022年11月28日

Local Explanations for Reinforcement Learning

Arxiv

0+阅读 · 2022年11月28日

Establishment of Neural Networks Robust to Label Noise

Arxiv

0+阅读 · 2022年11月28日

LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation

Arxiv

0+阅读 · 2022年11月28日

Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality?

Arxiv

0+阅读 · 2022年11月25日

PyTAIL: Interactive and Incremental Learning of NLP Models with Human in the Loop for Online Data

Arxiv

0+阅读 · 2022年11月24日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

相关基金

长链非编码RNA SATB2-AS1调控SATB2表达介导结直肠癌转移的功能及分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-146b负调控NF-κB信号通路抑制异常应力下椎间盘退变的机制

国家自然科学基金

0+阅读 · 2013年12月31日

COP1通过p53—Brn-3a调控Bcl-2的表达在CLL发病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CAP70经调节PTEN磷酸化影响肾癌恶性表型的研究

国家自然科学基金

0+阅读 · 2012年12月31日

SREBP1转录因子在奶牛乳腺MAC-T细胞中对SCD基因启动子的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高迁移率族蛋白1通过上调肝癌病人kupffer细胞Toll样受体和IL-33表达来促进Th17细胞的功能

国家自然科学基金

0+阅读 · 2012年12月31日

钙离子通道在子宫内膜癌细胞上皮间质转化中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

SATB2在结直肠癌转移中的作用及相关信号网络

国家自然科学基金

0+阅读 · 2009年12月31日

氟西汀对抑郁模型大鼠S100B介导的信号通路ERK-NFkB的影响

国家自然科学基金

0+阅读 · 2009年12月31日

ROS、PGs、NO和TNF-α22312;LPS调控肝细胞LXR-α21450;其靶基因中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员