政策因果梯级 (Causal Policy Gradients) - 专知论文

会员服务 ·

0

估计/估计量 · 方差 · 可辨认的 · 缩放 · 赌博机/老虎机 ·

2021 年 2 月 20 日

Causal Policy Gradients

翻译：政策因果梯级

Thomas Spooner,Nelson Vadori,Sumitra Ganesh

from arxiv, 16 pages, 11 figures

Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically with the number of targets. In this paper, we propose a causal baseline which exploits independence structure encoded in a novel action-target influence network. Causal policy gradients (CPGs), which follow, provide a common framework for analysing key state-of-the-art algorithms, are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes. We provide an analysis of the proposed estimator and identify the conditions under which variance is guaranteed to improve. The algorithmic aspects of CPGs are also discussed, including optimal policy factorisations, their complexity, and the use of conditioning to efficiently scale to extremely large, concurrent tasks. The performance advantages for two variants of the algorithm are demonstrated on large-scale bandit and concurrent inventory management problems.

翻译：政策梯度方法可以解决复杂的任务,但当行动空间或目标多样性的维度大幅增长时往往会失败。这在一定程度上是由于基于分数的梯度测算器与目标数量之间的四等比例差异所致。在本文件中,我们提出一个因果基线,利用在新型行动目标影响网络中编码的独立结构。随后的因果政策梯度为分析关键的最新算法提供了一个共同框架,显示它概括了传统政策梯度,并产生了一种将问题域的基因化过程纳入先前知识的有原则的方法。我们分析了拟议的测算器,并确定了保证改进差异的条件。我们还讨论了计算机梯度的算法方面,包括最佳政策因数化、其复杂性,以及利用条件高效率地缩小规模以完成极为庞大、同时的任务。两种变式算法的性能优势表现在大块和同时出现的库存管理问题上。

0

相关内容

估计/估计量

估计/估计量

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文】量子对抗机器学习，Quantum Adversarial Machine Learning

【论文】量子对抗机器学习，Quantum Adversarial Machine Learning

专知会员服务

38+阅读 · 2020年1月5日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Task Space Planning with Complementarity Constraint-based Obstacle Avoidance

Arxiv

0+阅读 · 2021年4月16日

Internet of quantum blockchains: security modeling and dynamic resource pricing for stable digital currency

Arxiv

0+阅读 · 2021年4月15日

Exact and Approximate Hierarchical Clustering Using A*

Arxiv

0+阅读 · 2021年4月14日

Safe Continuous Control with Constrained Model-Based Policy Optimization

Arxiv

0+阅读 · 2021年4月14日

Constrained Model-Free Reinforcement Learning for Process Optimization

Arxiv

0+阅读 · 2021年4月14日

Limit theorems for cloning algorithms

Arxiv

0+阅读 · 2021年4月14日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Arxiv

3+阅读 · 2019年6月20日

VIP会员

文章信息

相关主题

估计/估计量

赌博机/老虎机

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文】量子对抗机器学习，Quantum Adversarial Machine Learning

【论文】量子对抗机器学习，Quantum Adversarial Machine Learning

专知会员服务

38+阅读 · 2020年1月5日

【Google新论文】Learning Transferable Graph Exploration 附论文下载

【Google新论文】Learning Transferable Graph Exploration 附论文下载

专知会员服务

8+阅读 · 2019年11月4日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型基准综述

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

【剑桥博士论文】多智能体学习中的神经多样性

以色列-伊朗空战：短暂而激烈冲突的启示

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Task Space Planning with Complementarity Constraint-based Obstacle Avoidance

Arxiv

0+阅读 · 2021年4月16日

Internet of quantum blockchains: security modeling and dynamic resource pricing for stable digital currency

Arxiv

0+阅读 · 2021年4月15日

Exact and Approximate Hierarchical Clustering Using A*

Arxiv

0+阅读 · 2021年4月14日

Safe Continuous Control with Constrained Model-Based Policy Optimization

Arxiv

0+阅读 · 2021年4月14日

Constrained Model-Free Reinforcement Learning for Process Optimization

Arxiv

0+阅读 · 2021年4月14日

Limit theorems for cloning algorithms

Arxiv

0+阅读 · 2021年4月14日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Model-based Adversarial Meta-Reinforcement Learning

Arxiv

5+阅读 · 2020年6月16日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning

Arxiv

3+阅读 · 2019年6月20日

微信扫码咨询专知VIP会员