通过共享重力和互不共享加速深模式培训 (Speeding up Deep Model Training by Sharing Weights and Then Unsharing) - 专知论文

会员服务 ·

0

Weight · 权共享 · MoDELS · BERT · 深度模型 ·

2021 年 10 月 8 日

Speeding up Deep Model Training by Sharing Weights and Then Unsharing

翻译：通过共享重力和互不共享加速深模式培训

Shuo Yang,Le Hou,Xiaodan Song,Qiang Liu,Denny Zhou

We propose a simple and efficient approach for training the BERT model. Our approach exploits the special structure of BERT that contains a stack of repeated modules (i.e., transformer encoders). Our proposed approach first trains BERT with the weights shared across all the repeated modules till some point. This is for learning the commonly shared component of weights across all repeated layers. We then stop weight sharing and continue training until convergence. We present theoretic insights for training by sharing weights then unsharing with analysis for simplified models. Empirical experiments on the BERT model show that our method yields better performance of trained models, and significantly reduces the number of training iterations.

翻译：我们提出了培训BERT模式的简单而有效的方法。我们的方法利用了BERT的特殊结构,该结构包含一系列重复模块(即变压器编码器)。我们提出的方法首先将所有重复模块的重量共享到某一点。这是用来学习所有重复模块共有的重量部分。然后,我们停止重量共享,继续培训,直到趋同。我们提出培训理论见解,方法是分享重量,然后不与简化模型的分析分享。关于BERT模式的经验实验表明,我们的方法可以提高经过培训的模型的性能,并大大减少培训迭代的数量。

0

相关内容

Weight

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图神经网络架构，稳定性，可迁移性

专知会员服务

29+阅读 · 2020年8月8日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【《图解深度学习》电子书与代码，830页pdf】’Deep Learning Illustrated (2019)' by Deep Learning Study Group GitHub

【《图解深度学习》电子书与代码，830页pdf】’Deep Learning Illustrated (2019)' by Deep Learning Study Group GitHub

专知会员服务

152+阅读 · 2019年1月1日

已删除

将门创投

14+阅读 · 2019年5月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Training Efficiency and Robustness in Deep Learning

Arxiv

0+阅读 · 2021年12月2日

SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging

SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging

Arxiv

0+阅读 · 2021年12月1日

AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Arxiv

1+阅读 · 2021年11月30日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Cloze-driven Pretraining of Self-attention Networks

Arxiv

6+阅读 · 2019年3月19日

Meta-Transfer Learning for Few-Shot Learning

Meta-Transfer Learning for Few-Shot Learning

Arxiv

8+阅读 · 2018年12月6日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

Learning to Speed Up Query Planning in Graph Databases

Arxiv

6+阅读 · 2018年1月21日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图神经网络架构，稳定性，可迁移性

专知会员服务

29+阅读 · 2020年8月8日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【《图解深度学习》电子书与代码，830页pdf】’Deep Learning Illustrated (2019)' by Deep Learning Study Group GitHub

【《图解深度学习》电子书与代码，830页pdf】’Deep Learning Illustrated (2019)' by Deep Learning Study Group GitHub

专知会员服务

152+阅读 · 2019年1月1日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

已删除

将门创投

14+阅读 · 2019年5月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Training Efficiency and Robustness in Deep Learning

Arxiv

0+阅读 · 2021年12月2日

SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging

SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging

Arxiv

0+阅读 · 2021年12月1日

AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop

Arxiv

1+阅读 · 2021年11月30日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Cloze-driven Pretraining of Self-attention Networks

Arxiv

6+阅读 · 2019年3月19日

Meta-Transfer Learning for Few-Shot Learning

Meta-Transfer Learning for Few-Shot Learning

Arxiv

8+阅读 · 2018年12月6日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

Learning to Speed Up Query Planning in Graph Databases

Arxiv

6+阅读 · 2018年1月21日

微信扫码咨询专知VIP会员