Astopchatic Proximal 圆形步进大小 (A Stochastic Proximal Polyak Step Size) - 专知论文

会员服务 ·

0

正则化项 · 目标函数 · 泛函 · tuning · Extensibility ·

2023 年 1 月 12 日

A Stochastic Proximal Polyak Step Size

翻译：Astopchatic Proximal 圆形步进大小

Fabian Schaipp,Robert M. Gower,Michael Ulbrich

Recently, the stochastic Polyak step size (SPS) has emerged as a competitive adaptive step size scheme for stochastic gradient descent. Here we develop ProxSPS, a proximal variant of SPS that can handle regularization terms. Developing a proximal variant of SPS is particularly important, since SPS requires a lower bound of the objective function to work well. When the objective function is the sum of a loss and a regularizer, available estimates of a lower bound of the sum can be loose. In contrast, ProxSPS only requires a lower bound for the loss which is often readily available. As a consequence, we show that ProxSPS is easier to tune and more stable in the presence of regularization. Furthermore for image classification tasks, ProxSPS performs as well as AdamW with little to no tuning, and results in a network with smaller weight parameters. We also provide an extensive convergence analysis for ProxSPS that includes the non-smooth, smooth, weakly convex and strongly convex setting.

翻译：最近,微小的聚氨酯步骤尺寸(SPS)已成为一种具有竞争力的适应性步骤尺寸计划,用于随机梯度下降。在这里,我们开发了Prox-SPS(SPS),这是SPS的近似变体,可以处理正规化条件。开发一个最接近的SPS变体特别重要,因为卫生和植物检疫要求目标功能的较低范围才能很好地发挥作用。当目标功能是损失总和和和和调节器时,对较低比例的估计数可以松动。相反,Prox-SPS(Prox-SPS)只要求较低范围的损失通常很容易得到。因此,我们表明Prox-SPS在正规化的情况下更容易调和和更加稳定。此外,对于图像分类任务,Prox-SPS的表现和AdamW(AdamW)几乎没有调整,结果网络的重量参数较小。我们还对Prox-SPS(Prox-SPS)进行了广泛的趋同分析,其中包括非湿、光滑、弱软软的螺旋和强烈的交汇设置。

0

相关内容

正则化项

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

基于SLAF-seq技术的花生高密度遗传图谱构建及北方根结线虫病抗性QTL定位

国家自然科学基金

0+阅读 · 2014年12月31日

大豆促幼苗根系生长关键基因的精细定位与克隆

国家自然科学基金

0+阅读 · 2014年12月31日

玉米穗粒数形成的关键基因克隆与功能解析

国家自然科学基金

0+阅读 · 2013年12月31日

非自治/随机系统的渐近性态及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于SLAF-seq技术的猕猴桃种间“拟正反交”群体高密度遗传图谱构建

国家自然科学基金

0+阅读 · 2012年12月31日

油菜无花瓣性状主效QTL qAP8的精细定位与候选基因克隆

国家自然科学基金

0+阅读 · 2012年12月31日

磁性标记牙髓干细胞靶向迁移、分化及牙髓牙本质复合体原位再生

国家自然科学基金

0+阅读 · 2011年12月31日

一类四阶MEMS方程的解集结构与解的渐近性态

国家自然科学基金

0+阅读 · 2011年12月31日

油用亚麻遗传图谱构建及亚麻酸含量的QTL定位研究

国家自然科学基金

0+阅读 · 2011年12月31日

激光加工硅基上氧化量子点的受激发光研究

国家自然科学基金

0+阅读 · 2009年12月31日

New Perspectives on Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization

Arxiv

0+阅读 · 2023年3月7日

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

Arxiv

0+阅读 · 2023年3月6日

Accelerated Rates between Stochastic and Adversarial Online Convex Optimization

Arxiv

0+阅读 · 2023年3月6日

Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

Arxiv

0+阅读 · 2023年3月6日

MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

Arxiv

0+阅读 · 2023年3月6日

Classifying Ambiguous Identities in Hidden-Role Stochastic Games with Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月6日

Iterative Approximate Cross-Validation

Arxiv

0+阅读 · 2023年3月5日

Rate adaptive estimation of the center of a symmetric distribution

Arxiv

0+阅读 · 2023年3月3日

Guarded Policy Optimization with Imperfect Online Demonstrations

Arxiv

0+阅读 · 2023年3月3日

Near Optimal Memory-Regret Tradeoff for Online Learning

Arxiv

0+阅读 · 2023年3月3日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

网络科学赋能人工智能: 现状与展望

【NeurIPS2025教程】解释人工智能模型：可解释人工智能、数据中心人工智能与机制可解释性的方法与机遇

人工智能赋能作战行动：以俄乌战争为例

【ETHZ博士论文】表征学习在推进深度学习中的作用：效率、可扩展性与推理

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

相关论文

New Perspectives on Regularization and Computation in Optimal Transport-Based Distributionally Robust Optimization

Arxiv

0+阅读 · 2023年3月7日

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

Arxiv

0+阅读 · 2023年3月6日

Accelerated Rates between Stochastic and Adversarial Online Convex Optimization

Arxiv

0+阅读 · 2023年3月6日

Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

Arxiv

0+阅读 · 2023年3月6日

MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

Arxiv

0+阅读 · 2023年3月6日

Classifying Ambiguous Identities in Hidden-Role Stochastic Games with Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月6日

Iterative Approximate Cross-Validation

Arxiv

0+阅读 · 2023年3月5日

Rate adaptive estimation of the center of a symmetric distribution

Arxiv

0+阅读 · 2023年3月3日

Guarded Policy Optimization with Imperfect Online Demonstrations

Arxiv

0+阅读 · 2023年3月3日

Near Optimal Memory-Regret Tradeoff for Online Learning

Arxiv

0+阅读 · 2023年3月3日

相关基金

基于SLAF-seq技术的花生高密度遗传图谱构建及北方根结线虫病抗性QTL定位

国家自然科学基金

0+阅读 · 2014年12月31日

大豆促幼苗根系生长关键基因的精细定位与克隆

国家自然科学基金

0+阅读 · 2014年12月31日

玉米穗粒数形成的关键基因克隆与功能解析

国家自然科学基金

0+阅读 · 2013年12月31日

非自治/随机系统的渐近性态及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于SLAF-seq技术的猕猴桃种间“拟正反交”群体高密度遗传图谱构建

国家自然科学基金

0+阅读 · 2012年12月31日

油菜无花瓣性状主效QTL qAP8的精细定位与候选基因克隆

国家自然科学基金

0+阅读 · 2012年12月31日

磁性标记牙髓干细胞靶向迁移、分化及牙髓牙本质复合体原位再生

国家自然科学基金

0+阅读 · 2011年12月31日

一类四阶MEMS方程的解集结构与解的渐近性态

国家自然科学基金

0+阅读 · 2011年12月31日

油用亚麻遗传图谱构建及亚麻酸含量的QTL定位研究

国家自然科学基金

0+阅读 · 2011年12月31日

激光加工硅基上氧化量子点的受激发光研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员