减少差异:时间有限分析和复杂程度提高 (Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity) - 专知论文

会员服务 ·

0

样本复杂度 · 方差 · 方差减小 · 可约的 · 优化器 ·

2021 年 3 月 30 日

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

翻译：减少差异:时间有限分析和复杂程度提高

Shaocong Ma,Ziyi Chen,Yi Zhou,Shaofeng Zou

from arxiv, Accepted for publication in ICLR 2021

Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control. Recently, the finite-time analysis of Greedy-GQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown to achieve an $\epsilon$-stationary point with a sample complexity in the order of $\mathcal{O}(\epsilon^{-3})$. Such a high sample complexity is due to the large variance induced by the Markovian samples. In this paper, we propose a variance-reduced Greedy-GQ (VR-Greedy-GQ) algorithm for off-policy optimal control. In particular, the algorithm applies the SVRG-based variance reduction scheme to reduce the stochastic variance of the two time-scale updates. We study the finite-time convergence of VR-Greedy-GQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original Greedy-GQ. In particular, we prove that VR-Greedy-GQ achieves an improved sample complexity that is in the order of $\mathcal{O}(\epsilon^{-2})$. We further compare the performance of VR-Greedy-GQ with that of Greedy-GQ in various RL experiments to corroborate our theoretical findings.

翻译：贪婪- GQ 是一种基于价值的强化学习( RL) 算法, 以优化控制。最近, 对贪婪- GQ (VR- Greedy- GQ) 的有限时间分析是在线性功能近似和Markovian 样本导致的巨大差异导致的。在本文中, 我们提议了一种差异性偏差- GQ (VR- Greedy- GQ) 算法, 用于非政策性最佳控制。特别是, 算法应用基于 SVRG 的变差减少方案, 以降低两次时间尺度更新的随机偏差。我们研究VR- Greedy- GQ 在线性功能近似和Markovian 抽样下的定时性趋同, 并显示算法比最初的Greed- GQQ(VR- GQQQ) 更准确的偏差和变异性GGQQQ 。特别是, 我们证明V- G- GLQQ 的变异性变变的变GQ。

0

相关内容

样本复杂度

样本复杂度

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

【资源】文本风格迁移相关资源汇总

【资源】文本风格迁移相关资源汇总

专知

13+阅读 · 2020年7月11日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

已删除

将门创投

4+阅读 · 2019年4月1日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

专知

9+阅读 · 2018年3月21日

干货 | 自然语言处理入门资料推荐

干货 | 自然语言处理入门资料推荐

机器学习算法与Python学习

14+阅读 · 2018年1月2日

干货 | 情感分析语料库

干货 | 情感分析语料库

机器学习算法与Python学习

69+阅读 · 2017年7月3日

A matrix-oriented POD-DEIM algorithm applied to semilinear matrix differential equations

Arxiv

0+阅读 · 2021年5月25日

On robust learning in the canonical change point problem under heavy tailed errors in finite and growing dimensions

Arxiv

0+阅读 · 2021年5月25日

Spectral Pruning for Recurrent Neural Networks

Arxiv

0+阅读 · 2021年5月23日

ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction

Arxiv

3+阅读 · 2021年5月21日

Finding all minimum cost flows and a faster algorithm for the K best flow problem

Arxiv

0+阅读 · 2021年5月21日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

Improving Object Localization with Fitness NMS and Bounded IoU Loss

Arxiv

4+阅读 · 2017年11月8日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

【资源】文本风格迁移相关资源汇总

【资源】文本风格迁移相关资源汇总

专知

13+阅读 · 2020年7月11日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

已删除

将门创投

4+阅读 · 2019年4月1日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

【论文推荐】最新6篇图像分割相关论文—隐马尔可夫随机场、级联三维全卷积、信号处理、全卷积网络、多源域适应、循环分割

专知

9+阅读 · 2018年3月21日

干货 | 自然语言处理入门资料推荐

干货 | 自然语言处理入门资料推荐

机器学习算法与Python学习

14+阅读 · 2018年1月2日

干货 | 情感分析语料库

干货 | 情感分析语料库

机器学习算法与Python学习

69+阅读 · 2017年7月3日

相关论文

A matrix-oriented POD-DEIM algorithm applied to semilinear matrix differential equations

Arxiv

0+阅读 · 2021年5月25日

On robust learning in the canonical change point problem under heavy tailed errors in finite and growing dimensions

Arxiv

0+阅读 · 2021年5月25日

Spectral Pruning for Recurrent Neural Networks

Arxiv

0+阅读 · 2021年5月23日

ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction

Arxiv

3+阅读 · 2021年5月21日

Finding all minimum cost flows and a faster algorithm for the K best flow problem

Arxiv

0+阅读 · 2021年5月21日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

Improving Object Localization with Fitness NMS and Bounded IoU Loss

Arxiv

4+阅读 · 2017年11月8日

微信扫码咨询专知VIP会员