日志衰减重量平准化 (Logit Attenuating Weight Normalization) - 专知论文

会员服务 ·

0

Weight · 权重规范化 · 对数几率 · Networking · 优化器 ·

2021 年 8 月 12 日

Logit Attenuating Weight Normalization

翻译：日志衰减重量平准化

Aman Gupta,Rohan Ramanath,Jun Shi,Anika Ramachandran,Sirou Zhou,Mingzhou Zhou,S. Sathiya Keerthi

from arxiv, 23 pages

Over-parameterized deep networks trained using gradient-based optimizers are a popular choice for solving classification and ranking problems. Without appropriately tuned $\ell_2$ regularization or weight decay, such networks have the tendency to make output scores (logits) and network weights large, causing training loss to become too small and the network to lose its adaptivity (ability to move around) in the parameter space. Although regularization is typically understood from an overfitting perspective, we highlight its role in making the network more adaptive and enabling it to escape more easily from weights that generalize poorly. To provide such a capability, we propose a method called Logit Attenuating Weight Normalization (LAWN), that can be stacked onto any gradient-based optimizer. LAWN controls the logits by constraining the weight norms of layers in the final homogeneous sub-network. Empirically, we show that the resulting LAWN variant of the optimizer makes a deep network more adaptive to finding minimas with superior generalization performance on large-scale image classification and recommender systems. While LAWN is particularly impressive in improving Adam, it greatly improves all optimizers when used with large batch sizes

翻译：使用基于梯度的优化器所培训的超度深度网络是解决分类和排名问题的流行选择。不适当调整 $\ ell_ 2$ 正规化或重量衰减,这类网络倾向于使输出分数( logits) 和网络重量大得多,导致培训损失过小,网络在参数空间中失去适配性( 可移动性) 。虽然正规化通常从过分适当的角度理解,但我们强调其作用,使网络更适应性更强,使其更容易从普遍化不足的重量中逃脱。为了提供这种能力,我们建议了一种名为Logit Attatening Weight 正常化(LAWN) 的方法,这种方法可以堆积到任何基于梯度的优化器上。法律网通过限制最终同质子网络各层的重量规范来控制对日志的对日志进行控制。我们很生动地表明,由此形成的优化的LAPN变式使深网络更适应性,以找到在大型图像分类和建议系统中具有超超常化性性功能的微型。在改进亚当方面特别令人印象深刻。当改进亚当时,它能大大改进了所有与大批量使用时,它大大改进了所有优化。

0

相关内容

Weight

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

专知会员服务

19+阅读 · 2020年4月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【总结】强化学习需要批归一化(Batch Norm)吗？

【总结】强化学习需要批归一化(Batch Norm)吗？

深度强化学习实验室

28+阅读 · 2020年10月8日

Conditional Batch Normalization 详解

Conditional Batch Normalization 详解

极市平台

4+阅读 · 2019年4月12日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

激活函数初学者指南

激活函数初学者指南

论智

6+阅读 · 2018年5月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Dual Monte Carlo Tree Search

Arxiv

0+阅读 · 2021年10月10日

Deep-Dup: An Adversarial Weight Duplication Attack Framework to Crush Deep Neural Network in Multi-Tenant FPGA

Arxiv

0+阅读 · 2021年10月8日

Lightweight Convolutional Neural Networks By Hypercomplex Parameterization

Lightweight Convolutional Neural Networks By Hypercomplex Parameterization

Arxiv

0+阅读 · 2021年10月8日

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

Arxiv

0+阅读 · 2021年10月8日

Meta Batch-Instance Normalization for Generalizable Person Re-Identification

Meta Batch-Instance Normalization for Generalizable Person Re-Identification

Arxiv

3+阅读 · 2021年3月29日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Group Normalization

Arxiv

7+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

权重规范化

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

专知会员服务

19+阅读 · 2020年4月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

TensorFlow 2.0 学习资源汇总

TensorFlow 2.0 学习资源汇总

专知会员服务

67+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《陆军战斗操练中的关键事件诊断》

《自适应训练辅助概念及其在空战管理员加速训练中的应用导论》最新126页

军事通信市场七大趋势概述

《抗干扰无人机蜂群行为的遗传算法方法》

相关资讯

【总结】强化学习需要批归一化(Batch Norm)吗？

【总结】强化学习需要批归一化(Batch Norm)吗？

深度强化学习实验室

28+阅读 · 2020年10月8日

Conditional Batch Normalization 详解

Conditional Batch Normalization 详解

极市平台

4+阅读 · 2019年4月12日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

激活函数初学者指南

激活函数初学者指南

论智

6+阅读 · 2018年5月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Dual Monte Carlo Tree Search

Arxiv

0+阅读 · 2021年10月10日

Deep-Dup: An Adversarial Weight Duplication Attack Framework to Crush Deep Neural Network in Multi-Tenant FPGA

Arxiv

0+阅读 · 2021年10月8日

Lightweight Convolutional Neural Networks By Hypercomplex Parameterization

Lightweight Convolutional Neural Networks By Hypercomplex Parameterization

Arxiv

0+阅读 · 2021年10月8日

Characterizing and Demystifying the Implicit Convolution Algorithm on Commercial Matrix-Multiplication Accelerators

Arxiv

0+阅读 · 2021年10月8日

Meta Batch-Instance Normalization for Generalizable Person Re-Identification

Meta Batch-Instance Normalization for Generalizable Person Re-Identification

Arxiv

3+阅读 · 2021年3月29日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Group Normalization

Arxiv

7+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员