评价突厥语多语种多语种NMT (Evaluating Multiway Multilingual NMT in the Turkic Languages) - 专知论文

会员服务 ·

0

多语言神经机器翻译 · Performer · 机器翻译 · MoDELS · NMT ·

2021 年 9 月 13 日

Evaluating Multiway Multilingual NMT in the Turkic Languages

翻译：评价突厥语多语种多语种NMT

Jamshidbek Mirzakhalov,Anoop Babu,Aigiz Kunafin,Ahsan Wahab,Behzod Moydinboyev,Sardana Ivanova,Mokhiyakhon Uzokova,Shaxnoza Pulatova,Duygu Ataman,Julia Kreutzer,Francis Tyers,Orhan Firat,John Licato,Sriram Chellappan

from arxiv, 9 pages, 3 figures, 7 tables. To be presented at WMT 2021

Despite the increasing number of large and comprehensive machine translation (MT) systems, evaluation of these methods in various languages has been restrained by the lack of high-quality parallel corpora as well as engagement with the people that speak these languages. In this study, we present an evaluation of state-of-the-art approaches to training and evaluating MT systems in 22 languages from the Turkic language family, most of which being extremely under-explored. First, we adopt the TIL Corpus with a few key improvements to the training and the evaluation sets. Then, we train 26 bilingual baselines as well as a multi-way neural MT (MNMT) model using the corpus and perform an extensive analysis using automatic metrics as well as human evaluations. We find that the MNMT model outperforms almost all bilingual baselines in the out-of-domain test sets and finetuning the model on a downstream task of a single pair also results in a huge performance boost in both low- and high-resource scenarios. Our attentive analysis of evaluation criteria for MT models in Turkic languages also points to the necessity for further research in this direction. We release the corpus splits, test sets as well as models to the public.

翻译：尽管大型和综合性机器翻译系统的数量不断增加,但由于缺乏高质量的平行公司以及同讲这些语言的人接触,对这些不同语言方法的评价受到限制。在本研究报告中,我们评估了土耳其语家庭22种语言的MT系统培训和评价的最先进方法,其中大部分是极低频的。首先,我们采用TIL Corpus,对培训和评估组合作了一些关键的改进。然后,我们培训了26个双语基线以及一个使用材料的多路神经MT模型,并利用自动计量以及人的评价进行了广泛的分析。我们发现MNMT模型几乎超越了外部测试中所有双语基线,并且对一对一对一的下游任务模式进行了微调,这也极大地促进了低、高资源情景的情景。我们仔细分析了突厥语模型的评估标准,也指出了进一步研究这一方向的必要性。我们发布了分解模型,作为公共模型的测试。

0

相关内容

多语言神经机器翻译

多语言神经机器翻译

多语言机器翻译使用一个翻译模型来处理多种语言。

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署

【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署

专知会员服务

54+阅读 · 2020年7月18日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

专知

8+阅读 · 2018年6月21日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

机器翻译 | Bleu：此蓝;非彼蓝

机器翻译 | Bleu：此蓝;非彼蓝

黑龙江大学自然语言处理实验室

4+阅读 · 2018年3月14日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

MT3: Multi-Task Multitrack Music Transcription

MT3: Multi-Task Multitrack Music Transcription

Arxiv

0+阅读 · 2021年11月4日

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

Arxiv

0+阅读 · 2021年11月3日

Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Arxiv

0+阅读 · 2021年11月3日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Arxiv

3+阅读 · 2020年3月24日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

Arxiv

3+阅读 · 2018年4月16日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

What Level of Quality can Neural Machine Translation Attain on Literary Text?

Arxiv

5+阅读 · 2018年1月15日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

多语言神经机器翻译

相关VIP内容

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署

【ICML2020】机器学习产品生产部署流程，54页ppt讲述实际ML生产部署

专知会员服务

54+阅读 · 2020年7月18日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

专知

8+阅读 · 2018年6月21日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

机器翻译 | Bleu：此蓝;非彼蓝

机器翻译 | Bleu：此蓝;非彼蓝

黑龙江大学自然语言处理实验室

4+阅读 · 2018年3月14日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

相关论文

MT3: Multi-Task Multitrack Music Transcription

MT3: Multi-Task Multitrack Music Transcription

Arxiv

0+阅读 · 2021年11月4日

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

Arxiv

0+阅读 · 2021年11月3日

Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval

Arxiv

0+阅读 · 2021年11月3日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

Arxiv

3+阅读 · 2020年3月24日

Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation

Arxiv

4+阅读 · 2018年4月26日

Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

Arxiv

3+阅读 · 2018年4月16日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

What Level of Quality can Neural Machine Translation Attain on Literary Text?

Arxiv

5+阅读 · 2018年1月15日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员