术语:适应性T - 分配估计强势态势,以达到噪音 -- -- 燃烧式蒸汽梯级最佳放大剂 (AdaTerm: Adaptive T-Distribution Estimated Robust Moments towards Noise-Robust Stochastic Gradient Optimizer) - 专知论文

会员服务 ·

0

优化器 · 稳健性 · 估计/估计量 · 噪声 · 矩 ·

2022 年 1 月 18 日

AdaTerm: Adaptive T-Distribution Estimated Robust Moments towards Noise-Robust Stochastic Gradient Optimizer

翻译：术语:适应性T - 分配估计强势态势,以达到噪音 -- -- 燃烧式蒸汽梯级最佳放大剂

Wendyam Eric Lionel Ilboudo,Taisuke Kobayashi,Kenji Sugimoto

from arxiv, 23 pages, 7 figures, 3 tables

As the problems to be optimized with deep learning become more practical, their datasets inevitably contain a variety of noise, such as mislabeling and substitution by estimated inputs/outputs, which would have negative impacts on the optimization results. As a safety net, it is a natural idea to improve a stochastic gradient descent (SGD) optimizer, which updates the network parameters as the final process of learning, to be more robust to noise. The related work revealed that the first momentum utilized in the Adam-like SGD optimizers can be modified based on the noise-robust student's t-distribution, resulting in inheriting the robustness to noise. In this paper, we propose AdaTerm, which derives not only the first momentum but also all the involved statistics based on the student's t-distribution. If the computed gradients seem to probably be aberrant, AdaTerm is expected to exclude the computed gradients for updates, and reinforce the robustness for the next updates; otherwise, it updates the network parameters normally, and can relax the robustness for the next updates. With this noise-adaptive behavior, the excellent learning performance of AdaTerm was confirmed via typical optimization problems with several cases where the noise ratio would be different.

翻译：随着深层学习的问题变得更加实用,它们的数据集不可避免地含有各种噪音,例如错误标签和以估计投入/产出替代,这将对优化结果产生消极影响。作为一个安全网,改进随机梯度下降优化器(SGD)是一个自然的想法,它更新网络参数,作为最后学习过程,从而对噪音更加有力。相关工作显示,在类似亚当的SGD优化器中使用的第一个动力可以根据噪音-robust学生的T分布进行修改,从而继承对噪音的稳健性。在本文件中,我们提议Adaterm,它不仅产生第一个动力,而且根据学生的t分布提供所有相关统计数据。如果计算梯度看起来可能是异常的,Adaterm将会排除计算出的升级梯度,并增强下一次更新的稳健性;否则,它通常更新网络参数,并能够放松下一次更新的稳健性。由于这种噪音适应性能不仅产生第一种势头,而且根据学生的t分配情况产生所有相关的统计数据。如果计算出的梯度看起来可能异常,那么Adater将排除用于更新的计算梯度,并且加强下次更新的稳健性;否则,它通常更新网络参数,并能够放松下一次更新。

0

相关内容

优化器

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

RSK2介导Ras/MAPK对PTEN/Akt的调控作用：肠癌EGFR单抗获得性耐药的新机制？

国家自然科学基金

0+阅读 · 2015年12月31日

高糖影响肺动脉平滑肌细胞收缩增殖的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基底型乳腺癌干细胞信号传导网络结构建模

国家自然科学基金

0+阅读 · 2014年12月31日

拓扑绝缘体的表面化学和催化研究

国家自然科学基金

0+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

不确定性数据流自适应聚类分析及演化分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

A-beta损害nAChR在中间神经元内介导的兴奋性机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

以Her2/neu为靶点的新型VLP疫苗免疫应答研究

国家自然科学基金

0+阅读 · 2011年12月31日

整合猪miRNA和功能基因表达谱芯片元数据挖掘肌肉生长发育新的调控通路

国家自然科学基金

0+阅读 · 2009年12月31日

球面学习理论研究

国家自然科学基金

1+阅读 · 2008年12月31日

Robust Estimation of Discrete Distributions under Local Differential Privacy

Arxiv

0+阅读 · 2022年4月20日

An improved central limit theorem and fast convergence rates for entropic transportation costs

Arxiv

0+阅读 · 2022年4月19日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Arxiv

1+阅读 · 2022年4月18日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix Estimation

Arxiv

0+阅读 · 2022年4月16日

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Arxiv

0+阅读 · 2022年4月15日

Estimating distinguishability measures on quantum computers

Arxiv

0+阅读 · 2022年4月14日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

An alternative approach for distributed parameter estimation under Gaussian settings

Arxiv

0+阅读 · 2022年4月14日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

相关论文

Robust Estimation of Discrete Distributions under Local Differential Privacy

Arxiv

0+阅读 · 2022年4月20日

An improved central limit theorem and fast convergence rates for entropic transportation costs

Arxiv

0+阅读 · 2022年4月19日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Arxiv

1+阅读 · 2022年4月18日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Computationally Efficient and Statistically Optimal Robust Low-rank Matrix Estimation

Arxiv

0+阅读 · 2022年4月16日

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Arxiv

0+阅读 · 2022年4月15日

Estimating distinguishability measures on quantum computers

Arxiv

0+阅读 · 2022年4月14日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

An alternative approach for distributed parameter estimation under Gaussian settings

Arxiv

0+阅读 · 2022年4月14日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

相关基金

RSK2介导Ras/MAPK对PTEN/Akt的调控作用：肠癌EGFR单抗获得性耐药的新机制？

国家自然科学基金

0+阅读 · 2015年12月31日

高糖影响肺动脉平滑肌细胞收缩增殖的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基底型乳腺癌干细胞信号传导网络结构建模

国家自然科学基金

0+阅读 · 2014年12月31日

拓扑绝缘体的表面化学和催化研究

国家自然科学基金

0+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

不确定性数据流自适应聚类分析及演化分析方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

A-beta损害nAChR在中间神经元内介导的兴奋性机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

以Her2/neu为靶点的新型VLP疫苗免疫应答研究

国家自然科学基金

0+阅读 · 2011年12月31日

整合猪miRNA和功能基因表达谱芯片元数据挖掘肌肉生长发育新的调控通路

国家自然科学基金

0+阅读 · 2009年12月31日

球面学习理论研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员