因果政策排名 (Causal policy ranking) - 专知论文

会员服务 ·

0

秩 · 估计/估计量 · Integration · SimPLe · 黑盒 ·

2021 年 11 月 16 日

Causal policy ranking

翻译：因果政策排名

Daniel McNamee,Hana Chockler

from arxiv, preprint; 6 pages. arXiv admin note: substantial text overlap with arXiv:2008.13607

Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with $n$ time steps, a policy will make $n$ decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant is their contribution. Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment and ranks the decisions according to this estimate. In this preliminary work, we compare our measure against an alternative, non-causal, ranking procedure, highlight the benefits of causality-based policy ranking, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies.

翻译：通过强化学习(RL)培训的政策往往非常复杂,即使是简单的任务也是如此。在花费10美元的时间步骤的情况下,一项政策将就所要采取的行动做出不值一提的决定,其中很多行动对观察员来说似乎是不直观的。此外,还不清楚这些决定中哪些直接有助于实现奖赏,以及其贡献有多大。根据经过培训的政策,我们提出了一个基于反事实推理的黑盒法,该黑盒法估计这些决定对奖励成就的因果关系并根据这一估计对决定进行排序。在这个初步工作中,我们将我们的措施与替代的、非因果性、排名程序进行比较,突出基于因果关系的政策排名的好处,并讨论将因果算法纳入解释RL代理政策的未来工作。

0

相关内容

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

448页伊利诺伊大学《算法》图书-附下载

448页伊利诺伊大学《算法》图书-附下载

专知

15+阅读 · 2018年12月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Causal Analysis Framework for Recommendation

Arxiv

0+阅读 · 2022年1月18日

Learning Neural Ranking Models Online from Implicit User Feedback

Arxiv

0+阅读 · 2022年1月17日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning

Arxiv

7+阅读 · 2021年4月14日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Meta Learning for Causal Direction

Meta Learning for Causal Direction

Arxiv

5+阅读 · 2020年7月6日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Causal Embeddings for Recommendation

Arxiv

23+阅读 · 2018年8月3日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

448页伊利诺伊大学《算法》图书-附下载

448页伊利诺伊大学《算法》图书-附下载

专知

15+阅读 · 2018年12月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Causal Analysis Framework for Recommendation

Arxiv

0+阅读 · 2022年1月18日

Learning Neural Ranking Models Online from Implicit User Feedback

Arxiv

0+阅读 · 2022年1月17日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning

Arxiv

7+阅读 · 2021年4月14日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Meta Learning for Causal Direction

Meta Learning for Causal Direction

Arxiv

5+阅读 · 2020年7月6日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Causal Embeddings for Recommendation

Arxiv

23+阅读 · 2018年8月3日

微信扫码咨询专知VIP会员