政策分级方法对综合国家产生的近似效益 (Approximation Benefits of Policy Gradient Methods with Aggregated States) - 专知论文

会员服务 ·

0

近似 · 价值函数 · 值函数近似 · 衰减系数 · 值迭代 ·

2021 年 1 月 7 日

Approximation Benefits of Policy Gradient Methods with Aggregated States

翻译：政策分级方法对综合国家产生的近似效益

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregation, where the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per-period is bounded by $\epsilon$, the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as $\epsilon/(1-\gamma)$, where $\gamma$ is a discount factor. Theoretical results synthesize recent analysis of policy gradient methods with insights of Van Roy (2006) into the critical role of state-relevance weights in approximate dynamic programming.

翻译：民俗认为,政策梯度比其相对的、近似的政策迭代更强, 比其相对的、近似的政策迭代更强。本文研究了国家合并的情况, 国家空间被分割, 政策或价值函数近似被维持在分割区之上。本文显示了一种政策梯度方法, 政策梯度方法与每期遗憾被美元( epsilon $) 约束的政策相吻合, 美元是属于共同分割区的国家行动价值函数的两个要素之间的最大差异。以同样的代表方式, 政策迭代和近似值迭代可以产生政策, 其每期遗憾比额表为 $\ epsilon/ (1-\ gamma) $( $\ gamma), 折价系数为$\ gamma 。理论结果将最近的政策梯度方法分析与Van Roy(2006年) 的见解综合了政策梯度加权在近似动态方案规划中的关键作用。

0

相关内容

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

IJCAI2020接受论文列表，592篇论文pdf都在这了！

IJCAI2020接受论文列表，592篇论文pdf都在这了！

专知会员服务

64+阅读 · 2020年7月16日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

专知会员服务

84+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Soft-NMS – Improving Object Detection With One Line of Code

Soft-NMS – Improving Object Detection With One Line of Code

统计学习与视觉计算组

6+阅读 · 2018年3月30日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

A new Besse-type relaxation scheme for the numerical approximation of the Schrödinger-Poisson system

Arxiv

0+阅读 · 2021年3月8日

Greedy Multi-step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年3月7日

An approximation algorithm for approximation rank

Arxiv

0+阅读 · 2021年3月7日

Regression with reject option and application to kNN

Arxiv

0+阅读 · 2021年3月5日

Federated Learning with Randomized Douglas-Rachford Splitting Methods

Arxiv

0+阅读 · 2021年3月5日

Policy Decomposition: Approximate Optimal Control with Suboptimality Estimates

Arxiv

0+阅读 · 2021年3月3日

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Arxiv

4+阅读 · 2020年6月20日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

Arxiv

9+阅读 · 2018年4月22日

VIP会员

文章信息

相关主题

值函数近似

相关VIP内容

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

IJCAI2020接受论文列表，592篇论文pdf都在这了！

IJCAI2020接受论文列表，592篇论文pdf都在这了！

专知会员服务

64+阅读 · 2020年7月16日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

【DeepMind-Nando de Freitas】强化学习教程，102页ppt，Reinforcement Learning

专知会员服务

84+阅读 · 2019年11月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Soft-NMS – Improving Object Detection With One Line of Code

Soft-NMS – Improving Object Detection With One Line of Code

统计学习与视觉计算组

6+阅读 · 2018年3月30日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

A new Besse-type relaxation scheme for the numerical approximation of the Schrödinger-Poisson system

Arxiv

0+阅读 · 2021年3月8日

Greedy Multi-step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年3月7日

An approximation algorithm for approximation rank

Arxiv

0+阅读 · 2021年3月7日

Regression with reject option and application to kNN

Arxiv

0+阅读 · 2021年3月5日

Federated Learning with Randomized Douglas-Rachford Splitting Methods

Arxiv

0+阅读 · 2021年3月5日

Policy Decomposition: Approximate Optimal Control with Suboptimality Estimates

Arxiv

0+阅读 · 2021年3月3日

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Arxiv

4+阅读 · 2020年6月20日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

Arxiv

9+阅读 · 2018年4月22日

微信扫码咨询专知VIP会员