变换后通过梯度的后向梯度会提高趋同程度 (Altering backward pass gradients improves convergence) - 专知论文

会员服务 ·

0

Performer · 后向 · 梯度截断 · Networking · 层 ·

2022 年 5 月 6 日

Altering backward pass gradients improves convergence

翻译：变换后通过梯度的后向梯度会提高趋同程度

Bishshoy Das,Milton Mondal,Brejesh Lall,Shiv Dutt Joshi,Sumantra Dutta Roy

In typical neural network training, the gradients in the backward pass is determined by the forward pass. As a result, the two stages are coupled. However, it is often seen that neural networks perform worse when gradients explode or decline. To address this, numerous approaches like Gradient Clipping (GC) and Adaptive Gradient Clipping (AGC) have been developed to enhance the gradient behaviour of networks without normalization layers during backward passes. These techniques decouple the backward and forward passes and modify the gradients adaptively. A possible drawback of clipping approaches is that they must be calculated for each weight tensor in each layer. We offer the PowerGrad Transform (PGT), a comparable approach that alters and enhances the gradient flow behaviour in the backward pass but is calculated only in the final softmax layer. It is very computationally efficient and outperforms both GC and AGC, resulting in improved performance in networks without batch normalization. PGT is easy to integrate into existing networks, requiring just a few lines of code, and significantly increases performance in non-BN ResNets. The impact is more pronounced on big datasets like as ImageNet, when networks do not fit all of the training data and there is some training headroom. PGT makes it possible for the network to better fit the training data while simultaneously improving its performance on the test set.

翻译：在典型的神经网络培训中,后方通道的梯度由前方通道决定。因此,后方通道的梯度是由前方通道决定的。两个阶段相互交错。但是,通常可以看到,当梯度爆炸或下降时,神经网络的性能更差。要解决这个问题,已经开发了许多方法,如Gradient Clipping(GC)和适应性梯度缩压(AGC),以加强网络的梯度行为,而后方通道则不正常化。这些技术使后方和前方通道脱钩,并适应性能地修改梯度。剪接方法的一个可能的缺点是,必须计算每层的重量拉高。我们向PowerGrad变换(PGT)提供一种可比较的方法,即改变和增强后方通道的梯度流行为,但只在最后的软体层中计算出。它非常高效且超越GC和AGC(AGC)的梯度,从而改进网络的性能。PGT很容易融入现有的网络,只需要几行代码,并大大提高非BN ResNet的性能。我们更明显地评价了大GT网络。当它能够进行数据测试时,而使数据库更适合它适应于它。它的所有数据网络。当它适应于它的时候,它在改进了所有的测试网络。

0

相关内容

Performer

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

寡层过渡金属二硫族化合物电子结构的角分辨光电子能谱研究

国家自然科学基金

0+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

深紫外AlGaN光学各向异性与自旋轨道调制研究

国家自然科学基金

0+阅读 · 2013年12月31日

东江流域典型河流中紫外吸收剂的污染过程与环境归趋研究

国家自然科学基金

0+阅读 · 2013年12月31日

甲醇燃料燃烧的同步辐射研究

国家自然科学基金

1+阅读 · 2012年12月31日

两栖动物镇痛肽odorranaopin结构与功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

加成共聚制备有近红外特征发光的导电性混金属聚合物薄膜材料的研究及应用

国家自然科学基金

0+阅读 · 2012年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型层状Bi-Co-O基氧化物材料的制备与热电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Quasi-convergence of an implementation of optimal balance by backward-forward nudging

Arxiv

0+阅读 · 2022年6月27日

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

Arxiv

0+阅读 · 2022年6月27日

Meta-learning based Alternating Minimization Algorithm for Non-convex Optimization

Arxiv

0+阅读 · 2022年6月26日

A coupled phase field formulation for modelling fatigue cracking in lithium-ion battery electrode particles

Arxiv

0+阅读 · 2022年6月25日

Second-Order Mirror Descent: Convergence in Games Beyond Averaging and Discounting

Arxiv

0+阅读 · 2022年6月25日

The Rate of Convergence of Variation-Constrained Deep Neural Networks

Arxiv

0+阅读 · 2022年6月25日

Spelunking the Deep: Guaranteed Queries on General Neural Implicit Surfaces via Range Analysis

Arxiv

0+阅读 · 2022年6月24日

Learning Towards the Largest Margins

Arxiv

0+阅读 · 2022年6月23日

$\ell_{\infty}$-Bounds of the MLE in the BTL Model under General Comparison Graphs

Arxiv

0+阅读 · 2022年6月22日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

VIP会员

文章信息

相关主题

相关VIP内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

相关论文

Quasi-convergence of an implementation of optimal balance by backward-forward nudging

Arxiv

0+阅读 · 2022年6月27日

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

Arxiv

0+阅读 · 2022年6月27日

Meta-learning based Alternating Minimization Algorithm for Non-convex Optimization

Arxiv

0+阅读 · 2022年6月26日

A coupled phase field formulation for modelling fatigue cracking in lithium-ion battery electrode particles

Arxiv

0+阅读 · 2022年6月25日

Second-Order Mirror Descent: Convergence in Games Beyond Averaging and Discounting

Arxiv

0+阅读 · 2022年6月25日

The Rate of Convergence of Variation-Constrained Deep Neural Networks

Arxiv

0+阅读 · 2022年6月25日

Spelunking the Deep: Guaranteed Queries on General Neural Implicit Surfaces via Range Analysis

Arxiv

0+阅读 · 2022年6月24日

Learning Towards the Largest Margins

Arxiv

0+阅读 · 2022年6月23日

$\ell_{\infty}$-Bounds of the MLE in the BTL Model under General Comparison Graphs

Arxiv

0+阅读 · 2022年6月22日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

相关基金

寡层过渡金属二硫族化合物电子结构的角分辨光电子能谱研究

国家自然科学基金

0+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

深紫外AlGaN光学各向异性与自旋轨道调制研究

国家自然科学基金

0+阅读 · 2013年12月31日

东江流域典型河流中紫外吸收剂的污染过程与环境归趋研究

国家自然科学基金

0+阅读 · 2013年12月31日

甲醇燃料燃烧的同步辐射研究

国家自然科学基金

1+阅读 · 2012年12月31日

两栖动物镇痛肽odorranaopin结构与功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

加成共聚制备有近红外特征发光的导电性混金属聚合物薄膜材料的研究及应用

国家自然科学基金

0+阅读 · 2012年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型层状Bi-Co-O基氧化物材料的制备与热电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员