Stochastic Anderson 用于非convex 斯托卡优化混合 (Stochastic Anderson Mixing for Nonconvex Stochastic Optimization) - 专知论文

会员服务 ·

0

非凸 · 优化器 · 几乎必然收敛 · 方差减小 · 混合 ·

2021 年 10 月 4 日

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

翻译：Stochastic Anderson 用于非convex 斯托卡优化混合

Fuchao Wei,Chenglong Bao,Yang Liu

from arxiv, Accepted by the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Anderson mixing (AM) is an acceleration method for fixed-point iterations. Despite its success and wide usage in scientific computing, the convergence theory of AM remains unclear, and its applications to machine learning problems are not well explored. In this paper, by introducing damped projection and adaptive regularization to classical AM, we propose a Stochastic Anderson Mixing (SAM) scheme to solve nonconvex stochastic optimization problems. Under mild assumptions, we establish the convergence theory of SAM, including the almost sure convergence to stationary points and the worst-case iteration complexity. Moreover, the complexity bound can be improved when randomly choosing an iterate as the output. To further accelerate the convergence, we incorporate a variance reduction technique into the proposed SAM. We also propose a preconditioned mixing strategy for SAM which can empirically achieve faster convergence or better generalization ability. Finally, we apply the SAM method to train various neural networks including the vanilla CNN, ResNets, WideResNet, ResNeXt, DenseNet and RNN. Experimental results on image classification and language model demonstrate the advantages of our method.

翻译：安德森混合(AM)是固定点迭代的加速方法。尽管在科学计算中取得了成功且使用范围很广,但AM的趋同理论仍然不清楚,而且对机器学习问题的应用也没有很好地探讨。本文通过引入有缺陷的投影和对古典AM的适应性整顿,我们提出了一个Stochasteic Anderson混合(SAM)计划,以解决非convex蒸馏优化问题。在温和假设下,我们建立了SAM的趋同理论,包括几乎可以肯定地与固定点的趋同和最坏的迭代的复杂性。此外,如果随机选择一个变相作为输出,则复杂性可以改进。为了进一步加快这种趋同,我们还在拟议的SAM中纳入了一种减少差异的技术。我们还提议了SAM的前提条件混合战略,以实验方式实现更快的趋同或更概括化能力。最后,我们运用SAM方法来培训各种神经网络,包括香草CNN、ResNets、WideResNet、ResNET、ResNET、ResNEET、DenseNet和RNNNNNN。关于图像分类和语言模型的实验结果显示了我们的方法的优点。

0

相关内容

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

专知会员服务

69+阅读 · 2020年6月6日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

6+阅读 · 2019年4月10日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Stochastic Normalizing Flows for Inverse Problems: a Markov Chains Viewpoint

Stochastic Normalizing Flows for Inverse Problems: a Markov Chains Viewpoint

Arxiv

0+阅读 · 2021年11月26日

A deep learning based reduced order modeling for stochastic underground flow problems

Arxiv

0+阅读 · 2021年11月26日

Stochastic optimization under distributional drift

Arxiv

0+阅读 · 2021年11月25日

Randomized Stochastic Gradient Descent Ascent

Arxiv

0+阅读 · 2021年11月25日

Stochastic Iterative Graph Matching

Arxiv

6+阅读 · 2021年6月4日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

几乎必然收敛

相关VIP内容

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

专知会员服务

69+阅读 · 2020年6月6日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

6+阅读 · 2019年4月10日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Stochastic Normalizing Flows for Inverse Problems: a Markov Chains Viewpoint

Stochastic Normalizing Flows for Inverse Problems: a Markov Chains Viewpoint

Arxiv

0+阅读 · 2021年11月26日

A deep learning based reduced order modeling for stochastic underground flow problems

Arxiv

0+阅读 · 2021年11月26日

Stochastic optimization under distributional drift

Arxiv

0+阅读 · 2021年11月25日

Randomized Stochastic Gradient Descent Ascent

Arxiv

0+阅读 · 2021年11月25日

Stochastic Iterative Graph Matching

Arxiv

6+阅读 · 2021年6月4日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员