受关注- 被分离的碎屑渐变人后裔 (Attentional-Biased Stochastic Gradient Descent) - 专知论文

会员服务 ·

0

Weight · Learning · 随机梯度下降 · SimPLe · 后向 ·

2022 年 12 月 25 日

Attentional-Biased Stochastic Gradient Descent

翻译：受关注- 被分离的碎屑渐变人后裔

Qi Qi,Yi Xu,Rong Jin,Wotao Yin,Tianbao Yang

from arxiv, 24 pages

In this paper, we present a simple yet effective method (ABSGD) for addressing the data imbalance issue in deep learning. Our method is a simple modification to momentum SGD where we leverage an attentional mechanism to assign an individual importance weight to each gradient in the mini-batch. Unlike many existing heuristic-driven methods for tackling data imbalance, our method is grounded in {\it theoretically justified distributionally robust optimization (DRO)}, which is guaranteed to converge to a stationary point of an information-regularized DRO problem. The individual-level weight of a sampled data is systematically proportional to the exponential of a scaled loss value of the data, where the scaling factor is interpreted as the regularization parameter in the framework of information-regularized DRO. Compared with existing class-level weighting schemes, our method can capture the diversity between individual examples within each class. Compared with existing individual-level weighting methods using meta-learning that require three backward propagations for computing mini-batch stochastic gradients, our method is more efficient with only one backward propagation at each iteration as in standard deep learning methods. To balance between the learning of feature extraction layers and the learning of the classifier layer, we employ a two-stage method that uses SGD for pretraining followed by ABSGD for learning a robust classifier and finetuning lower layers. Our empirical studies on several benchmark datasets demonstrate the effectiveness of the proposed method.

翻译：在本文中,我们展示了一种简单而有效的方法(ABSGD),用以解决深层学习中的数据不平衡问题。我们的方法是对动力 SGD的简单修改。我们的方法是对动力 SGD的简单修改,即我们利用一个关注机制来给微型批量中的每个梯度分配一个个重要重量权重。与许多现有的解决数据不平衡问题的超理论驱动方法不同,我们的方法基于在理论上理论上合理的分配强力优化(DRO),这保证与信息常规化的DRO问题的一个固定点汇合。抽样数据的个人等级权重与数据缩减损失价值的指数成系统性的成比例比例比例成正比,我们把缩放系数解释为信息正规化的DRO框架中的正规化参数。与现有的班级加权制度相比,我们的方法可以捕捉到每个班级中单个例子的多样性。与现有的个人等级加权加权方法相比,即采用元学习方法,即需要三种落后的传播方法来计算微型批量的梯度梯度梯度。我们的方法更有效率,在数据缩减损失价值的指数指数值的指数值指数化参数化指数级研究中只有一次次次次次次的后传播。为了学习,我们采用的SBSGBSBSBSBS的升级方法,在学习前的升级方法之前的升级方法学习。

0

相关内容

Weight

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

69+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

25+阅读 · 2021年4月2日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

17+阅读 · 2020年6月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

1+阅读 · 2022年11月2日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

11+阅读 · 2019年5月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

12+阅读 · 2017年9月24日

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

LPS促进MDSCs扩增和极化的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cu-Pt纳米颗粒去合金化过程中特征结构形成与演化的原子模拟

国家自然科学基金

0+阅读 · 2013年12月31日

利用小鼠疾病模型研究DNA甲基化及非编码RNA在情感与记忆分子机制中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

抵抗素在膀胱癌发生发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

支架蛋白Cullins在非小细胞肺癌发生发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

高维问题和稳健性研究

国家自然科学基金

0+阅读 · 2009年12月31日

有界噪声激励下非线性系统的全局动力学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Debiased Distillation by Transplanting the Last Layer

Arxiv

0+阅读 · 2023年2月22日

A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss

Arxiv

0+阅读 · 2023年2月21日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Arxiv

12+阅读 · 2021年4月27日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

11+阅读 · 2020年6月23日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

15+阅读 · 2020年3月30日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

13+阅读 · 2019年8月8日

Prime Sample Attention in Object Detection

Arxiv

12+阅读 · 2019年4月9日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

69+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

25+阅读 · 2021年4月2日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

17+阅读 · 2020年6月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

98+阅读 · 2019年10月9日

热门VIP内容

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

1+阅读 · 2022年11月2日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

11+阅读 · 2019年5月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

12+阅读 · 2017年9月24日

相关论文

Debiased Distillation by Transplanting the Last Layer

Arxiv

0+阅读 · 2023年2月22日

A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss

Arxiv

0+阅读 · 2023年2月21日

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion

Arxiv

12+阅读 · 2021年4月27日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

11+阅读 · 2020年6月23日

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks

Arxiv

15+阅读 · 2020年3月30日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

13+阅读 · 2019年8月8日

Prime Sample Attention in Object Detection

Arxiv

12+阅读 · 2019年4月9日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

LPS促进MDSCs扩增和极化的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cu-Pt纳米颗粒去合金化过程中特征结构形成与演化的原子模拟

国家自然科学基金

0+阅读 · 2013年12月31日

利用小鼠疾病模型研究DNA甲基化及非编码RNA在情感与记忆分子机制中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

抵抗素在膀胱癌发生发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

支架蛋白Cullins在非小细胞肺癌发生发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

模-相对Hochschild同调与上同调

国家自然科学基金

0+阅读 · 2011年12月31日

高维问题和稳健性研究

国家自然科学基金

0+阅读 · 2009年12月31日

有界噪声激励下非线性系统的全局动力学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员