自动自动转换: 通过强化建筑搜索实现变形器设计自动化 (AutoTrans: Automating Transformer Design via Reinforced Architecture Search)

Though the transformer architectures have shown dominance in many natural language understanding tasks, there are still unsolved issues for the training of transformer models, especially the need for a principled way of warm-up which has shown importance for stable training of a transformer, as well as whether the task at hand prefer to scale the attention product or not. In this paper, we empirically explore automating the design choices in the transformer model, i.e., how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand. RL is employed to navigate along search space, and special parameter sharing strategies are designed to accelerate the search. It is shown that sampling a proportion of training data per epoch during search help to improve the search quality. Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers. In particular, we show that our learned model can be trained more robustly with large learning rates without warm-up.

翻译：尽管变压器结构在许多自然语言理解任务中表现出主导地位,但对于变压器模型的培训仍有尚未解决的问题,特别是需要一种原则化的暖化方法,该方法对于变压器的稳定培训非常重要,以及手头的任务是否更愿意扩大关注产品的规模。在本文中,我们从经验上探索变压器模型的设计选择的自动化,即如何设置层-中枢,是否到规模、层数、头、激活功能等,以便获得更适合手头任务的变压器结构。使用RL在搜索空间航行,并设计特殊参数共享战略来加速搜索。显示,在搜索帮助提高搜索质量的过程中,每区取样一部分培训数据。CONLLL03、MUMUM-30k、IWSLT14和WMT-14的实验显示,搜索变压器模型可以比标准变压器更精确。特别是,我们显示,我们所学过的模型可以在不暖的情况下以高学习率进行更强的培训。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日