具有峰峰限制的多发展方案的无示范性、无峰峰值标准 (Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints) - 专知论文

会员服务 ·

0

PAC学习理论 · 概率近似正确 · 约束 · 优化器 · 转移概率 ·

2021 年 1 月 30 日

Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints

翻译：具有峰峰限制的多发展方案的无示范性、无峰峰值标准

Qinbo Bai,Vaneet Aggarwal,Ather Gattami

In the optimization of dynamic systems, the variables typically have constraints. Such problems can be modeled as a Constrained Markov Decision Process (CMDP). This paper considers the peak constraints, where the agent chooses the policy to maximize the long-term average reward as well as satisfies the constraints at each time. We propose a model-free algorithm that converts CMDP problem to an unconstrained problem and a Q-learning based approach is used. We extend the concept of probably approximately correct (PAC) to define a criterion of $\epsilon$-optimal policy. The proposed algorithm is proved to achieve an $\epsilon$-optimal policy with probability at least $1-p$ when the episode $K\geq\Omega(\frac{I^2H^6SA\ell}{\epsilon^2})$, where $S$ and $A$ is the number of states and actions, respectively, $H$ is the number of steps per episode, $I$ is the number of constraint functions, and $\ell=\log(\frac{SAT}{p})$. We note that this is the first result on PAC kind of analysis for CMDP with peak constraints, where the transition probabilities are not known apriori. We demonstrate the proposed algorithm on an energy harvesting problem where it outperforms state-of-the-art and performs close to the theoretical upper bound of the studied optimization problem.

翻译：在优化动态系统时,变量通常会受到制约。这类问题可以模拟成一个约束性马可夫决策程序(CMDP)。本文考虑了峰值限制,在峰值限制中,代理商选择政策以最大限度地提高长期平均报酬,同时满足每一次的限制。我们提议了一个将CMDP问题转换成不受约束问题的无模式算法,并采用了基于Q学习的方法。我们扩展了大概大致正确(PAC)的概念,以界定一个$-美元的最佳政策标准。在K\geq\Omega(\\geq\Omega)(\I2H2H6SA\ell-unepsilon2})中,如果将CMDP问题转换成一个不受约束的问题,则可能实现一个至少1美元的最高限值的峰值政策。我们注意到,美元是州和行动的数量,分别是每集问题的步骤数量,美元是制约功能的数量,而美元是近端的PQLO(frac{SAT_QQrima)政策, 概率至少为1美元,而我们所了解的能源的升级分析结果。

0

相关内容

PAC学习理论

PAC学习理论

PAC学习理论不关心假设选择算法，他关心的是能否从假设空间H中学习一个好的假设h。此理论不关心怎样在假设空间中寻找好的假设，只关心能不能找得到。现在我们在来看一下什么叫“好假设”？只要满足两个条件(PAC辨识条件)即可

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

专知会员服务

99+阅读 · 2020年7月6日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

已删除

将门创投

8+阅读 · 2019年8月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Arxiv

0+阅读 · 2021年3月25日

Compressed Gradient Tracking for Decentralized Optimization with Linear Convergence

Arxiv

0+阅读 · 2021年3月25日

Solving and Learning Nonlinear PDEs with Gaussian Processes

Arxiv

0+阅读 · 2021年3月24日

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Arxiv

0+阅读 · 2021年3月24日

Bandit Learning for Dynamic Colonel Blotto Game with a Budget Constraint

Arxiv

0+阅读 · 2021年3月23日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

PAC学习理论

概率近似正确

相关VIP内容

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

专知会员服务

99+阅读 · 2020年7月6日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

已删除

将门创投

8+阅读 · 2019年8月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Arxiv

0+阅读 · 2021年3月25日

Compressed Gradient Tracking for Decentralized Optimization with Linear Convergence

Arxiv

0+阅读 · 2021年3月25日

Solving and Learning Nonlinear PDEs with Gaussian Processes

Arxiv

0+阅读 · 2021年3月24日

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Arxiv

0+阅读 · 2021年3月24日

Bandit Learning for Dynamic Colonel Blotto Game with a Budget Constraint

Arxiv

0+阅读 · 2021年3月23日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员