兄弟辍学 (Fraternal Dropout) - 专知论文

会员服务 ·

0

fraternal dropout · 暂退法 · Neural Networks · 正则化项 · 暂退掩码 ·

2017 年 11 月 16 日

Fraternal Dropout

翻译：兄弟辍学

Konrad Zolna,Devansh Arpit,Dendi Suhubdy,Yoshua Bengio

from arxiv, Added official GitHub code for replication. Added references. Corrected typos

Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A number of techniques have been proposed in literature to address this problem. In this paper we propose a simple technique called fraternal dropout that takes advantage of dropout to achieve this goal. Specifically, we propose to train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions. In this way our regularization encourages the representations of RNNs to be invariant to dropout mask, thus being robust. We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout. We evaluate our model and achieve state-of-the-art results in sequence modeling tasks on two benchmark datasets - Penn Treebank and Wikitext-2. We also show that our approach leads to performance improvement by a significant margin in image captioning (Microsoft COCO) and semi-supervised (CIFAR-10) tasks.

翻译：经常性神经网络(RNNs)是神经网络中重要的建筑结构,可用于语言建模和顺序预测。然而,优化RNNs已知比饲料前神经网络更难实现优化。文献中提出了解决这一问题的若干技术。我们在此文件中提议了一个简单的技术,称为兄弟辍学,利用辍学的优势来实现这一目标。具体地说,我们提议用不同的辍学面具来训练两个相同的RNN(共享参数),同时尽量缩小其(软式前)预测之间的差别。这样,我们的正规化鼓励RNS表示对辍学面具的不适应性,从而变得强大。我们表明,我们的正规化术语受预期线性辍学目标的高度约束,该目标已经表明要解决由于火车与辍学的推论阶段之间的差异而产生的差距。我们评估了我们的模型,并在两个基准数据集(Penn Treebank和Wikiptext-2)的顺序建模任务中取得了最新的结果。我们还表明,我们的做法可以通过一个显著的图像定位空间改进(MIROCO)和10号任务(MIRCROCO)中的显著差距。

0

相关内容

fraternal dropout

fraternal dropout

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

Bivariate Beta LSTM

Bivariate Beta LSTM

Arxiv

6+阅读 · 2019年10月7日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Meta-Learning with Differentiable Convex Optimization

Arxiv

5+阅读 · 2019年4月23日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Token-level and sequence-level loss smoothing for RNN language models

Arxiv

7+阅读 · 2018年5月14日

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Arxiv

4+阅读 · 2018年1月30日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

VIP会员

文章信息

相关主题

fraternal dropout

Neural Networks

相关VIP内容

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

Bivariate Beta LSTM

Bivariate Beta LSTM

Arxiv

6+阅读 · 2019年10月7日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

A Simple BERT-Based Approach for Lexical Simplification

A Simple BERT-Based Approach for Lexical Simplification

Arxiv

6+阅读 · 2019年7月16日

Meta-Learning with Differentiable Convex Optimization

Arxiv

5+阅读 · 2019年4月23日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Token-level and sequence-level loss smoothing for RNN language models

Arxiv

7+阅读 · 2018年5月14日

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Arxiv

4+阅读 · 2018年1月30日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

Language Modeling with Gated Convolutional Networks

Arxiv

5+阅读 · 2017年9月8日

微信扫码咨询专知VIP会员