Gaussian 使用微型散货碎屑梯底的工艺推论:聚合保障和实证效益 (Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits) - 专知论文

会员服务 ·

0

随机梯度下降 · SGD · 小批量 · 泛化理论 · 估计/估计量 ·

2021 年 11 月 19 日

Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits

翻译：Gaussian 使用微型散货碎屑梯底的工艺推论:聚合保障和实证效益

Hao Chen,Lili Zheng,Raed Al Kontar,Garvesh Raskutti

Stochastic gradient descent (SGD) and its variants have established themselves as the go-to algorithms for large-scale machine learning problems with independent samples due to their generalization performance and intrinsic computational advantage. However, the fact that the stochastic gradient is a biased estimator of the full gradient with correlated samples has led to the lack of theoretical understanding of how SGD behaves under correlated settings and hindered its use in such cases. In this paper, we focus on hyperparameter estimation for the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full log-likelihood loss function, and recovers model hyperparameters with rate $O(\frac{1}{K})$ for $K$ iterations, up to a statistical error term depending on the minibatch size. Our theoretical guarantees hold provided that the kernel functions exhibit exponential or polynomial eigendecay which is satisfied by a wide range of kernels commonly used in GPs. Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.

翻译：软性梯度(SGD) 及其变种已经确立为对独立样本的大规模机器学习问题进行自我测算的算法, 其原因是其一般性能和内在计算优势。然而, 软性梯度是全梯度的偏差估测器, 与相关样品相交, 导致对 SGD 在相关设置下的行为方式缺乏理论理解, 并妨碍在此类情况下使用SGOS。在本文中, 我们侧重于高斯进程( GP) 的超光度估计, 并朝着打破屏障迈出了一步, 证明迷性球 SGD 与全日志相似丢失功能的临界点汇合, 并用美元(\\ frac{ 1 ⁇ K}) 和美元( $K) 的循环率回收模型超度度计, 导致对SGDGD 如何在相关情况下的行为缺乏理论上的错误术语。我们的理论保证是, 内核函数显示指数或多核基因基因分解, 这得到在GPGD中常用的一系列广泛使用的内核内核分解系统所满足, 。在模拟和新计算方法上, 模拟和真实数据分析中, 改进的模型的新的计算方法都显示, 改进了模拟和模拟和精确分析方法的新的计算方法的缩压后, 改进了总化方法的新的计算, 减少了了新的计算方法, 改进了新的计算方法。

0

相关内容

随机梯度下降

随机梯度下降

随机梯度下降，按照数据生成分布抽取m个样本，通过计算他们梯度的平均值来更新梯度。

【ICML2021】随机森林机器遗忘

专知会员服务

21+阅读 · 2021年8月9日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图机器学习-图拉普拉斯算子的离散正则性，141页ppt，Discrete regularity graph Laplacians

专知会员服务

29+阅读 · 2020年6月4日

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

专知会员服务

13+阅读 · 2019年11月22日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

BAT题库 | 机器学习面试1000题系列（第161~165题）

BAT题库 | 机器学习面试1000题系列（第161~165题）

七月在线实验室

7+阅读 · 2017年11月6日

干货 | 如何理解深度学习分布式训练中的large batch size与learning rate的关系？

干货 | 如何理解深度学习分布式训练中的large batch size与learning rate的关系？

AI科技评论

5+阅读 · 2017年11月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Stochastic Coded Federated Learning with Convergence and Privacy Guarantees

Arxiv

0+阅读 · 2022年1月25日

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

Arxiv

0+阅读 · 2022年1月24日

Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems

Arxiv

0+阅读 · 2022年1月24日

Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks

Arxiv

0+阅读 · 2022年1月23日

Differentially Private SGDA for Minimax Problems

Arxiv

0+阅读 · 2022年1月22日

Asymptotic approximation of the likelihood of stationary determinantal point processes

Asymptotic approximation of the likelihood of stationary determinantal point processes

Arxiv

0+阅读 · 2022年1月21日

High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees

Arxiv

0+阅读 · 2022年1月21日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

VIP会员

文章信息

相关主题

随机梯度下降

估计/估计量

相关VIP内容

【ICML2021】随机森林机器遗忘

专知会员服务

21+阅读 · 2021年8月9日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

图机器学习-图拉普拉斯算子的离散正则性，141页ppt，Discrete regularity graph Laplacians

专知会员服务

29+阅读 · 2020年6月4日

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

【IPAM 】张量主元分析中的高维成本景观和梯度下降及其推广（High-dimensional cost landscape and gradient descent in Tensor PCA and its generalisations），附41页pdf

专知会员服务

13+阅读 · 2019年11月22日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

BAT题库 | 机器学习面试1000题系列（第161~165题）

BAT题库 | 机器学习面试1000题系列（第161~165题）

七月在线实验室

7+阅读 · 2017年11月6日

干货 | 如何理解深度学习分布式训练中的large batch size与learning rate的关系？

干货 | 如何理解深度学习分布式训练中的large batch size与learning rate的关系？

AI科技评论

5+阅读 · 2017年11月2日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Stochastic Coded Federated Learning with Convergence and Privacy Guarantees

Arxiv

0+阅读 · 2022年1月25日

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

Arxiv

0+阅读 · 2022年1月24日

Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems

Arxiv

0+阅读 · 2022年1月24日

Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks

Arxiv

0+阅读 · 2022年1月23日

Differentially Private SGDA for Minimax Problems

Arxiv

0+阅读 · 2022年1月22日

Asymptotic approximation of the likelihood of stationary determinantal point processes

Asymptotic approximation of the likelihood of stationary determinantal point processes

Arxiv

0+阅读 · 2022年1月21日

High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees

Arxiv

0+阅读 · 2022年1月21日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

微信扫码咨询专知VIP会员