几乎最佳政策优化,在任何时间保证下稳定 (Nearly Optimal Policy Optimization with Stable at Any Time Guarantee) - 专知论文

会员服务 ·

0

优化器 · INFORMS · state-of-the-art · 可理解性 · 强化学习 ·

2021 年 12 月 22 日

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

翻译：几乎最佳政策优化,在任何时间保证下稳定

Tianhao Wu,Yunchang Yang,Han Zhong,Liwei Wang,Simon S. Du,Jiantao Jiao

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. However, theoretical understanding of these methods remains insufficient. Even in the episodic (time-inhomogeneous) tabular setting, the state-of-the-art theoretical result of policy-based method in \citet{shani2020optimistic} is only $\tilde{O}(\sqrt{S^2AH^4K})$ where $S$ is the number of states, $A$ is the number of actions, $H$ is the horizon, and $K$ is the number of episodes, and there is a $\sqrt{SH}$ gap compared with the information theoretic lower bound $\tilde{\Omega}(\sqrt{SAH^3K})$. To bridge such a gap, we propose a novel algorithm Reference-based Policy Optimization with Stable at Any Time guarantee (\algnameacro), which features the property "Stable at Any Time". We prove that our algorithm achieves $\tilde{O}(\sqrt{SAH^3K} + \sqrt{AH^4K})$ regret. When $S > H$, our algorithm is minimax optimal when ignoring logarithmic factors. To our best knowledge, RPO-SAT is the first computationally efficient, nearly minimax optimal policy-based algorithm for tabular RL.

翻译：政策优化方法是最广泛使用的强化学习(RL)算法类别之一。然而,对于这些方法的理论理解仍然不够。即使在( 时间- 无异) 列表设置中, 基于政策的方法在\ citet{shani2020optimatistit} 中的最新理论结果只是$\tilde{O}( sqrt{S ⁇ 2AH4K}) 美元, 美元是州数, 美元是行动的数量, 美元是地平线, 美元是事件的数量, 美元是事件的数量, 而且与基于政策的方法在较低约束 $\ tilde_Omega} (\qrt{Sah3K} ) 的信息相比, 最先进的算法基于参考的迷你政策优化, 我们的最佳算法在任何时间保证(\ anamecro) 时显示属性“ 时间表 ” 。我们的算法是美元=QQrqrqral{S&Qrqrr} 。

0

相关内容

优化器

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Python编程基础，121页ppt

Python编程基础，121页ppt

专知会员服务

49+阅读 · 2021年1月1日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence

Arxiv

0+阅读 · 2022年2月24日

Submodular Maximization in Clean Linear Time

Arxiv

0+阅读 · 2022年2月24日

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

Arxiv

0+阅读 · 2022年2月23日

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年2月23日

Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms

Arxiv

0+阅读 · 2022年2月23日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Composite Adversarial Attacks

Arxiv

12+阅读 · 2020年12月10日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Python编程基础，121页ppt

Python编程基础，121页ppt

专知会员服务

49+阅读 · 2021年1月1日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence

Arxiv

0+阅读 · 2022年2月24日

Submodular Maximization in Clean Linear Time

Arxiv

0+阅读 · 2022年2月24日

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance

Arxiv

0+阅读 · 2022年2月23日

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年2月23日

Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms

Arxiv

0+阅读 · 2022年2月23日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Composite Adversarial Attacks

Arxiv

12+阅读 · 2020年12月10日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

微信扫码咨询专知VIP会员