MTet:英语和越南语多域翻译 (MTet: Multi-domain Translation for English and Vietnamese) - 专知论文

会员服务 ·

0

state-of-the-art · MoDELS · BLEU · 情景 · 得分 ·

2022 年 10 月 19 日

MTet: Multi-domain Translation for English and Vietnamese

翻译：MTet:英语和越南语多域翻译

Chinh Ngo,Trieu H. Trinh,Long Phan,Hieu Tran,Tai Dang,Hieu Nguyen,Minh Nguyen,Minh-Thang Luong

We introduce MTet, the largest publicly available parallel corpus for English-Vietnamese translation. MTet consists of 4.2M high-quality training sentence pairs and a multi-domain test set refined by the Vietnamese research community. Combining with previous works on English-Vietnamese translation, we grow the existing parallel dataset to 6.2M sentence pairs. We also release the first pretrained model EnViT5 for English and Vietnamese languages. Combining both resources, our model significantly outperforms previous state-of-the-art results by up to 2 points in translation BLEU score, while being 1.6 times smaller.

翻译：我们引入了MTet, 这是可供公众查阅的英文-越南文翻译的最大平行文件。 MTet 由4.2M 高质量培训配对和由越南研究界改进的多域测试组组成。结合以前关于英语- 越南文翻译的工作,我们将现有的平行数据集增加到6.2M 句。我们还为英语和越南语发布了第一个预先培训的EnVIT5模型。将这两种资源结合起来, 我们的模型在翻译BLEU分数方面大大优于以往的最新结果, 最多比BLEU分数高出2个百分点, 更小1.6倍。

0

相关内容

state-of-the-art

state-of-the-art

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

长链非编码RNA FENDRR拷贝数变异调控Snail1基因影响人群肺癌发病和预后的研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-ANRIL在人类T淋巴细胞白血病1型病毒致癌中的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

双波长好坏腔一体的主动光钟

国家自然科学基金

0+阅读 · 2014年12月31日

miR-592、NF-κB以及lncRNA在神经干细胞分化过程中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

Nrdp1在巨噬细胞抗结核感染中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes

Arxiv

0+阅读 · 2022年12月1日

Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models

Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models

Arxiv

0+阅读 · 2022年12月1日

Improving the Cross-Lingual Generalisation in Visual Question Answering

Arxiv

0+阅读 · 2022年11月30日

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Arxiv

0+阅读 · 2022年11月29日

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

Arxiv

0+阅读 · 2022年11月29日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《北约联合仿真与集成、验证与鉴定服务标准》2025最新40页

《面向协同任务的无人地面车辆与无人机（UGV-UAV）集成研究综述》2025最新综述论文

《理解大语言模型在军事战术任务规划中的局限性》

《国防与安全会议论文集》最新80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes

Arxiv

0+阅读 · 2022年12月1日

Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models

Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models

Arxiv

0+阅读 · 2022年12月1日

Improving the Cross-Lingual Generalisation in Visual Question Answering

Arxiv

0+阅读 · 2022年11月30日

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Arxiv

0+阅读 · 2022年11月29日

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

Arxiv

0+阅读 · 2022年11月29日

相关基金

长链非编码RNA FENDRR拷贝数变异调控Snail1基因影响人群肺癌发病和预后的研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-ANRIL在人类T淋巴细胞白血病1型病毒致癌中的作用及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

双波长好坏腔一体的主动光钟

国家自然科学基金

0+阅读 · 2014年12月31日

miR-592、NF-κB以及lncRNA在神经干细胞分化过程中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

Nrdp1在巨噬细胞抗结核感染中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员