审查用于机器笔译的语文示范结构的规模和转让 (Examining Scaling and Transfer of Language Model Architectures for Machine Translation) - 专知论文

会员服务 ·

0

Performer · 语言模型化 · MoDELS · Machine Translation · 缩放 ·

2022 年 2 月 16 日

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

翻译：审查用于机器笔译的语文示范结构的规模和转让

Biao Zhang,Behrooz Ghorbani,Ankur Bapna,Yong Cheng,Xavier Garcia,Jonathan Shen,Orhan Firat

Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs. In this work, we thoroughly examine the role of several architectural design choices on the performance of LMs on bilingual, (massively) multilingual and zero-shot translation tasks, under systematic variations of data conditions and model sizes. Our results show that: (i) Different LMs have different scaling properties, where architectural differences often have a significant impact on model performance at small scales, but the performance gap narrows as the number of parameters increases, (ii) Several design choices, including causal masking and language-modeling objectives for the source sequence, have detrimental effects on translation quality, and (iii) When paired with full-visible masking for source sequences, LMs could perform on par with EncDec on supervised bilingual and multilingual translation tasks, and improve greatly on zero-shot directions by facilitating the reduction of off-target translations.

翻译：自然语言理解和生成模式遵循了两种主要建筑范式之一:语言模型(LMS),该模式处理在单堆层层中排列序列的顺序,以及编码解码模型(EncDec),该模型利用不同的层层堆堆投入和输出处理。在机器翻译中,EncDec长期以来一直是偏好的方法,但很少研究LMS的性能。在这项工作中,我们彻底审查了几项建筑设计选择对LMS双语(大规模)多语种和零弹式翻译任务绩效的作用,在数据条件和模型大小的系统性变化下,LMS可与EcDec不同的规模特性,其中建筑差异往往对小规模的模型性能产生重大影响,但随着参数数量的增加,性能差距缩小,(二) 几种设计选择,包括因果遮掩护和源序列的语言模型设计目标,对翻译质量产生有害影响,以及(三) 当与源序列的全视遮罩配合时,LMS可与EcDec(监督的双语和多语种翻译方向的减少)一起工作,通过零度改进。

0

相关内容

Performer

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

基于HTCPN和动态博弈的SCADA系统可生存性建模与分析方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

近藤格点系统中的自旋轨道耦合和拓扑量子态

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

高分辨率SAR图像目标认知模型及高效算法

国家自然科学基金

3+阅读 · 2013年12月31日

CyberKnife剂量校准与验证的系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

多元整数值GARCH模型的统计分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于散射特征的多波段SAR地物目标变化检测关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

基于动态混合故障模型和进化博弈论的可生存性分析方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于MUAV平台的ARGIS扩展技术

国家自然科学基金

1+阅读 · 2009年12月31日

Few-Shot Learning with Siamese Networks and Label Tuning

Arxiv

1+阅读 · 2022年4月20日

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Arxiv

0+阅读 · 2022年4月20日

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Arxiv

0+阅读 · 2022年4月19日

PaLM: Scaling Language Modeling with Pathways

Arxiv

0+阅读 · 2022年4月19日

Point-Level Region Contrast for Object Detection Pre-Training

Arxiv

1+阅读 · 2022年4月19日

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Arxiv

0+阅读 · 2022年4月17日

Evaluation Benchmarks for Spanish Sentence Representations

Arxiv

0+阅读 · 2022年4月15日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

Neural Architecture Search: A Survey

Arxiv

12+阅读 · 2018年9月5日

VIP会员

文章信息

相关主题

语言模型化

Machine Translation

相关VIP内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

通信行业：智能低空通感网络白皮书

3D形状生成：综述

6000字《伊朗-以色列战争解析：欺骗与信息战如何塑造公众认知》最新报告（附原文）

【博士论文】优化智能体工作流以提升信息获取效率

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Few-Shot Learning with Siamese Networks and Label Tuning

Arxiv

1+阅读 · 2022年4月20日

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Arxiv

0+阅读 · 2022年4月20日

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Arxiv

0+阅读 · 2022年4月19日

PaLM: Scaling Language Modeling with Pathways

Arxiv

0+阅读 · 2022年4月19日

Point-Level Region Contrast for Object Detection Pre-Training

Arxiv

1+阅读 · 2022年4月19日

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Arxiv

0+阅读 · 2022年4月17日

Evaluation Benchmarks for Spanish Sentence Representations

Arxiv

0+阅读 · 2022年4月15日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

Neural Architecture Search: A Survey

Arxiv

12+阅读 · 2018年9月5日

相关基金

基于HTCPN和动态博弈的SCADA系统可生存性建模与分析方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

近藤格点系统中的自旋轨道耦合和拓扑量子态

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

高分辨率SAR图像目标认知模型及高效算法

国家自然科学基金

3+阅读 · 2013年12月31日

CyberKnife剂量校准与验证的系统研究

国家自然科学基金

0+阅读 · 2012年12月31日

多元整数值GARCH模型的统计分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于散射特征的多波段SAR地物目标变化检测关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

基于动态混合故障模型和进化博弈论的可生存性分析方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于MUAV平台的ARGIS扩展技术

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员