Nys-Curve: 用于斯托卡优化的 Nysström- Appriewd 曲线 (Nys-Curve: Nyström-Approximated Curvature for Stochastic Optimization) - 专知论文

会员服务 ·

0

近似 · 曲率 · Better · 优化器 · CC ·

2021 年 10 月 16 日

Nys-Curve: Nyström-Approximated Curvature for Stochastic Optimization

翻译：Nys-Curve: 用于斯托卡优化的 Nysström- Appriewd 曲线

Hardik Tankaria,Dinesh Singh,Makoto Yamada

The quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton step-based stochastic optimization algorithm for large-scale empirical risk minimization of convex functions with linear convergence rates. Specifically, we compute a partial column Hessian of size ($d\times k$) with $k\ll d$ randomly selected variables, then use the \textit{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. Furthermore, to address large-scale scenarios in which even computing a partial Hessian may require significant time, we used distribution-preserving (DP) sub-sampling to compute a partial Hessian. The DP sub-sampling generates $p$ sub-samples with similar first and second-order distribution statistics and selects a single sub-sample at each epoch in a round-robin manner to compute the partial Hessian. We integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradients to solve the logistic regression problem. The numerical experiments show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method with performance competitive with the state-of-the-art first-order and the stochastic quasi-Newton methods.

翻译：准Newton 方法一般通过使用松动方程式来近似 Hessian 以随机选择的 $k\ll d美元变量来提供曲线信息。但是, 松动方程式由于使用一阶衍生物而近似于 Newston 步步步相近。在这次研究中, 我们提议为大规模实验风险最小化的 convex 函数, 使用线性趋同率, 大约采用 Newton 步相基优化算法。具体地说, 我们用随机选择的 $k\ll d$ 来计算一个大小部分的 Hessian 柱形( $d\ times k$), 然后再使用\ textitleit{ nystries\'om 方法来更好地接近全牛顿步骤。为了进一步降低计算复杂性, 我们直接将更新步骤的 Newta\bilentralal- 方法( DP) 和 IMBeral- pal- passia 的亚序方法显示一个更精确的缩缩化方法。

0

相关内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《自动微分》综述教程，71页ppt

最新《自动微分》综述教程，71页ppt

专知会员服务

22+阅读 · 2020年11月22日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

170+阅读 · 2020年5月10日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

tf.GradientTape 详解

tf.GradientTape 详解

TensorFlow

120+阅读 · 2020年2月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

4+阅读 · 2019年5月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

时间序列算法ARIMA介绍

时间序列算法ARIMA介绍

凡人机器学习

5+阅读 · 2017年6月2日

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Arxiv

0+阅读 · 2021年12月16日

Nonparametric empirical Bayes estimation based on generalized Laguerre series

Arxiv

0+阅读 · 2021年12月16日

Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming

Arxiv

0+阅读 · 2021年12月16日

Guaranteed a posteriori local error estimation for finite element solutions of boundary value problems

Arxiv

0+阅读 · 2021年12月16日

Budget-limited distribution learning in multifidelity problems

Arxiv

0+阅读 · 2021年12月16日

Structure-Exploiting Newton-Type Method for Optimal Control of Switched Systems

Arxiv

0+阅读 · 2021年12月14日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《自动微分》综述教程，71页ppt

最新《自动微分》综述教程，71页ppt

专知会员服务

22+阅读 · 2020年11月22日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

170+阅读 · 2020年5月10日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

tf.GradientTape 详解

tf.GradientTape 详解

TensorFlow

120+阅读 · 2020年2月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

4+阅读 · 2019年5月8日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

时间序列算法ARIMA介绍

时间序列算法ARIMA介绍

凡人机器学习

5+阅读 · 2017年6月2日

相关论文

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Arxiv

0+阅读 · 2021年12月16日

Nonparametric empirical Bayes estimation based on generalized Laguerre series

Arxiv

0+阅读 · 2021年12月16日

Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming

Arxiv

0+阅读 · 2021年12月16日

Guaranteed a posteriori local error estimation for finite element solutions of boundary value problems

Arxiv

0+阅读 · 2021年12月16日

Budget-limited distribution learning in multifidelity problems

Arxiv

0+阅读 · 2021年12月16日

Structure-Exploiting Newton-Type Method for Optimal Control of Switched Systems

Arxiv

0+阅读 · 2021年12月14日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员