对贴贴现 MDP 的遗憾封框 (Regret Bounds for Discounted MDPs) - 专知论文

会员服务 ·

0

学习器 · Performer · Better · 极大 · SAT ·

2021 年 5 月 27 日

Regret Bounds for Discounted MDPs

翻译：对贴贴现 MDP 的遗憾封框

Shuang Liu,Hao Su

Reinforcement learning (RL) has traditionally been understood from an episodic perspective; the concept of non-episodic RL, where there is no restart and therefore no reliable recovery, remains elusive. A fundamental question in non-episodic RL is how to measure the performance of a learner and derive algorithms to maximize such performance. Conventional wisdom is to maximize the difference between the average reward received by the learner and the maximal long-term average reward. In this paper, we argue that if the total time budget is relatively limited compared to the complexity of the environment, such comparison may fail to reflect the finite-time optimality of the learner. We propose a family of measures, called $\gamma$-regret, which we believe to better capture the finite-time optimality. We give motivations and derive lower and upper bounds for such measures. Note: A follow-up work (arXiv:2010.00587) has improved both our lower and upper bound, the gap is now closed at $\tilde{\Theta}\left(\frac{\sqrt{SAT}}{(1 - \gamma)^{\frac{1}{2}}}\right)$.

翻译：强化学习(RL)传统上是从零散的角度理解的;非episodic RL(没有重新启动,因此没有可靠的恢复)的概念仍然难以实现。非episodic RL的一个基本问题是如何衡量学习者的表现并得出算法以最大限度地提高这种表现。传统智慧是最大限度地扩大学习者得到的平均奖励与最高长期平均奖励之间的差别。在本文中,我们争论说,如果总预算相对于环境的复杂性而言相对有限,这种比较可能无法反映学习者的有限时间最佳性。我们建议了一套措施,称为$\gamma$-regret,我们认为这组措施可以更好地捕捉到有限的时间最佳性。我们给出了动力,为这类措施提供了下下限和上限。注意:后续工作(arXiv:2010.00587)改善了我们的下限和上限,差距现在在$\telde_left_laft(\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

0

相关内容

学习器

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

已删除

将门创投

5+阅读 · 2018年1月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Reward-Weighted Regression Converges to a Global Optimum

Arxiv

0+阅读 · 2021年7月19日

Algorithms for general hard-constraint point processes via discretization

Arxiv

0+阅读 · 2021年7月19日

Lower bounds for invariant statistical models with applications to principal component analysis

Arxiv

0+阅读 · 2021年7月19日

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Arxiv

0+阅读 · 2021年7月18日

Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning

Arxiv

0+阅读 · 2021年7月17日

Refined Policy Improvement Bounds for MDPs

Arxiv

0+阅读 · 2021年7月16日

Linear Programming Bounds for Almost-Balanced Binary Codes

Arxiv

0+阅读 · 2021年7月16日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

相关VIP内容

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

已删除

将门创投

5+阅读 · 2018年1月24日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Reward-Weighted Regression Converges to a Global Optimum

Arxiv

0+阅读 · 2021年7月19日

Algorithms for general hard-constraint point processes via discretization

Arxiv

0+阅读 · 2021年7月19日

Lower bounds for invariant statistical models with applications to principal component analysis

Arxiv

0+阅读 · 2021年7月19日

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Arxiv

0+阅读 · 2021年7月18日

Rate-Distortion Analysis of Minimum Excess Risk in Bayesian Learning

Arxiv

0+阅读 · 2021年7月17日

Refined Policy Improvement Bounds for MDPs

Arxiv

0+阅读 · 2021年7月16日

Linear Programming Bounds for Almost-Balanced Binary Codes

Arxiv

0+阅读 · 2021年7月16日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员