重新加权 Coin 的翻转侧面: 适应性辍学和常规化的质量 (The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization) - 专知论文

会员服务 ·

0

暂退法 · 正则化项 · 稀疏化 · Weight · CASE ·

2021 年 6 月 14 日

The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization

翻译：重新加权 Coin 的翻转侧面: 适应性辍学和常规化的质量

Daniel LeJeune,Hamid Javadi,Richard G. Baraniuk

from arxiv, 19 pages, 2 figures. Submitted to NeurIPS 2021

Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called "$\eta$-trick" that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep network sparsification, validating our theory.

翻译：渗透深层(神经)网络的最成功方法就是那些适应性地掩盖整个培训过程中网络重量的方法。通过检查这种蒙面或辍学,在线性案例中,我们发现了这种适应性方法与通过所谓的“$\eta$-trick”实现正规化的双重性,两者的双重性既表现为迭代的再加权优化。我们表明,任何适应单调方式的重量的辍学策略都相当于有效的次水下规范化处罚,因此导致零散的解决方案。我们为一些流行的垃圾化策略获得有效的处罚,这些策略与稀疏优化通常使用的典型处罚非常相似。考虑到作为一种案例研究的变异性辍学,我们展示了适应性退出方法与深网络聚变的经典方法之间的类似经验行为,从而验证了我们的理论。

0

相关内容

暂退法

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

专知会员服务

34+阅读 · 2020年3月4日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

282+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

已删除

德先生

53+阅读 · 2019年4月28日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

Arxiv

0+阅读 · 2021年8月16日

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

Arxiv

0+阅读 · 2021年8月15日

Deeper or Wider Networks of Point Clouds with Self-attention?

Arxiv

0+阅读 · 2021年8月14日

A new omnibus test of fit based on a characterisation of the uniform distribution

Arxiv

0+阅读 · 2021年8月13日

A posteriori error analysis of a local adaptive discontinuous Galerkin method for convection-diffusion-reaction equations

Arxiv

0+阅读 · 2021年8月12日

Deep Stable Learning for Out-Of-Distribution Generalization

Arxiv

13+阅读 · 2021年4月16日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Linkage between Piecewise Constant Mumford-Shah model and ROF model and its virtue in image segmentation

Linkage between Piecewise Constant Mumford-Shah model and ROF model and its virtue in image segmentation

Arxiv

4+阅读 · 2018年7月26日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

VIP会员

文章信息

相关主题

相关VIP内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

专知会员服务

34+阅读 · 2020年3月4日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

282+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】多目标奖励与偏好优化：理论与算法

《无形的防御者？将定向能武器集成到反无人机框架的机遇与挑战》报告

自主化海军：海上无人系统与未来海战

迈向智能体系统规模化的科学

相关资讯

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

已删除

德先生

53+阅读 · 2019年4月28日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

Arxiv

0+阅读 · 2021年8月16日

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

Arxiv

0+阅读 · 2021年8月15日

Deeper or Wider Networks of Point Clouds with Self-attention?

Arxiv

0+阅读 · 2021年8月14日

A new omnibus test of fit based on a characterisation of the uniform distribution

Arxiv

0+阅读 · 2021年8月13日

A posteriori error analysis of a local adaptive discontinuous Galerkin method for convection-diffusion-reaction equations

Arxiv

0+阅读 · 2021年8月12日

Deep Stable Learning for Out-Of-Distribution Generalization

Arxiv

13+阅读 · 2021年4月16日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Linkage between Piecewise Constant Mumford-Shah model and ROF model and its virtue in image segmentation

Linkage between Piecewise Constant Mumford-Shah model and ROF model and its virtue in image segmentation

Arxiv

4+阅读 · 2018年7月26日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

微信扫码咨询专知VIP会员