变换器中跨层共享参数的经验教训 (Lessons on Parameter Sharing across Layers in Transformers) - 专知论文

会员服务 ·

0

参数共享 · 层 · 变换 · 训练数据 · 大学 ·

2022 年 4 月 21 日

Lessons on Parameter Sharing across Layers in Transformers

翻译：变换器中跨层共享参数的经验教训

Sho Takase,Shun Kiyono

We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.

翻译：我们为变换器提出了一种参数共享方法(Vaswani等人,2017年)。拟议方法放宽了一种广泛使用的技术,即与通用变换器等所有层共享一个层的参数(Dehghani等人,2019年),以提高计算时间的效率。我们提出了三种战略:序列、循环和周期(rev),为每个层指定参数。实验结果显示,拟议战略在参数大小和计算时间方面是有效的。此外,我们指出,拟议战略在配置中也是有效的,我们使用了许多培训数据,例如最近的WMT竞赛。

0

相关内容

参数共享

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

活性氧调控植物根生长的分子机理

国家自然科学基金

0+阅读 · 2015年12月31日

新型Plectin-1荧光、MRI靶向分子探针对胰腺癌早期诊断的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

BAG3与MACC1相互作用在甲状腺癌细胞上皮间质转化(EMT) 及侵袭中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

《物理》期刊

国家自然科学基金

4+阅读 · 2013年2月4日

用于同步辐射测量的阻性MicroMEGAS研究

国家自然科学基金

0+阅读 · 2012年12月31日

Fourier型标架与分形谱测度

国家自然科学基金

0+阅读 · 2012年12月31日

稳定度条件与环的正则性、clean性

国家自然科学基金

0+阅读 · 2012年12月31日

纳米材料形貌可控合成机理的原位同步辐射研究

国家自然科学基金

0+阅读 · 2012年12月31日

可压和不可压Navier-Stokes方程的一些问题

国家自然科学基金

0+阅读 · 2009年12月31日

NAGphormer: Neighborhood Aggregation Graph Transformer for Node Classification in Large Graphs

Arxiv

0+阅读 · 2022年6月10日

Revisiting Unsupervised Meta-Learning via the Characteristics of Few-Shot Tasks

Arxiv

0+阅读 · 2022年6月9日

Cluster Builder -- A DSL to Deploy a Parallel Application Over a Workstation Cluster

Cluster Builder -- A DSL to Deploy a Parallel Application Over a Workstation Cluster

Arxiv

0+阅读 · 2022年6月9日

SwinCheX: Multi-label classification on chest X-ray images with transformers

Arxiv

0+阅读 · 2022年6月9日

Escaping the Big Data Paradigm with Compact Transformers

Arxiv

0+阅读 · 2022年6月7日

Parametric Chordal Sparsity for SDP-based Neural Network Verification

Arxiv

0+阅读 · 2022年6月7日

The Spike Gating Flow: A Hierarchical Structure Based Spiking Neural Network for Online Gesture Recognition

Arxiv

0+阅读 · 2022年6月7日

Artificial Intelligence for the Metaverse: A Survey

Arxiv

31+阅读 · 2022年2月15日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Arxiv

16+阅读 · 2017年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

NAGphormer: Neighborhood Aggregation Graph Transformer for Node Classification in Large Graphs

Arxiv

0+阅读 · 2022年6月10日

Revisiting Unsupervised Meta-Learning via the Characteristics of Few-Shot Tasks

Arxiv

0+阅读 · 2022年6月9日

Cluster Builder -- A DSL to Deploy a Parallel Application Over a Workstation Cluster

Cluster Builder -- A DSL to Deploy a Parallel Application Over a Workstation Cluster

Arxiv

0+阅读 · 2022年6月9日

SwinCheX: Multi-label classification on chest X-ray images with transformers

Arxiv

0+阅读 · 2022年6月9日

Escaping the Big Data Paradigm with Compact Transformers

Arxiv

0+阅读 · 2022年6月7日

Parametric Chordal Sparsity for SDP-based Neural Network Verification

Arxiv

0+阅读 · 2022年6月7日

The Spike Gating Flow: A Hierarchical Structure Based Spiking Neural Network for Online Gesture Recognition

Arxiv

0+阅读 · 2022年6月7日

Artificial Intelligence for the Metaverse: A Survey

Arxiv

31+阅读 · 2022年2月15日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Arxiv

16+阅读 · 2017年11月20日

相关基金

活性氧调控植物根生长的分子机理

国家自然科学基金

0+阅读 · 2015年12月31日

新型Plectin-1荧光、MRI靶向分子探针对胰腺癌早期诊断的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

BAG3与MACC1相互作用在甲状腺癌细胞上皮间质转化(EMT) 及侵袭中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

《物理》期刊

国家自然科学基金

4+阅读 · 2013年2月4日

用于同步辐射测量的阻性MicroMEGAS研究

国家自然科学基金

0+阅读 · 2012年12月31日

Fourier型标架与分形谱测度

国家自然科学基金

0+阅读 · 2012年12月31日

稳定度条件与环的正则性、clean性

国家自然科学基金

0+阅读 · 2012年12月31日

纳米材料形貌可控合成机理的原位同步辐射研究

国家自然科学基金

0+阅读 · 2012年12月31日

可压和不可压Navier-Stokes方程的一些问题

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员