以反事实为由改进人类反向强化学习 (Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning) - 专知论文

会员服务 ·

0

逆强化学习 · 学习器 · Performer · 可理解性 · Learning ·

2022 年 8 月 3 日

Reasoning about Counterfactuals to Improve Human Inverse Reinforcement Learning

翻译：以反事实为由改进人类反向强化学习

Michael S. Lee,Henny Admoni,Reid Simmons

from arxiv, 8 pages, 5 figures, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

To collaborate well with robots, we must be able to understand their decision making. Humans naturally infer other agents' beliefs and desires by reasoning about their observable behavior in a way that resembles inverse reinforcement learning (IRL). Thus, robots can convey their beliefs and desires by providing demonstrations that are informative for a human learner's IRL. An informative demonstration is one that differs strongly from the learner's expectations of what the robot will do given their current understanding of the robot's decision making. However, standard IRL does not model the learner's existing expectations, and thus cannot do this counterfactual reasoning. We propose to incorporate the learner's current understanding of the robot's decision making into our model of human IRL, so that a robot can select demonstrations that maximize the human's understanding. We also propose a novel measure for estimating the difficulty for a human to predict instances of a robot's behavior in unseen environments. A user study finds that our test difficulty measure correlates well with human performance and confidence. Interestingly, considering human beliefs and counterfactuals when selecting demonstrations decreases human performance on easy tests, but increases performance on difficult tests, providing insight on how to best utilize such models.

翻译：要与机器人合作,我们就必须能够理解他们的决策。人类自然地通过推理其他代理人的信仰和欲望,以类似于反强化学习(IRL)的方式推理他们的可见行为,自然地推断出其他代理人的信仰和欲望。因此,机器人可以通过为人类学习者的IRL提供信息的演示来表达他们的信仰和愿望。一个内容丰富的演示是一个与学习者对机器人在当前了解机器人决策的情况下对机器人将做什么的期望截然不同的新措施。然而,标准的IRL并不模拟学习者的现有期望,因此无法进行这一反事实推理。我们提议将学习者目前对机器人决策的理解纳入我们的人类IRL模型,以便机器人能够选择能够最大限度地提高人类理解度的演示。我们还提出了一个新措施,用以估计人类难以预测机器人在不可见环境中的行为实例。用户研究发现,我们的测试难度与人类业绩和信心密切相关。有趣的是,在选择示范时,考虑到人类信仰和反事实如何降低人类在简单测试中的性能,但提高人的性能。

0

相关内容

逆强化学习

逆强化学习

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

人可溶性鸟苷酸环化酶介导一氧化氮信号转导的结构基础和调控分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

城市交通出行行为潜变量作用机理及整合模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

拟南芥STRF1蛋白参与植物盐胁迫信号转导途径的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

拓扑超导体的基态和元激发性质

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于荧光有机纳米粒子的双重信号传感研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用单体化合物的碳同位素揭示植物凋落物降解机制

国家自然科学基金

0+阅读 · 2012年12月31日

纳米孔洞型d-f混金属配位聚合物作为重金属离子荧光探针的研究

国家自然科学基金

0+阅读 · 2012年12月31日

手性簇（聚）合物晶体的设计组装与光磁性质研究

国家自然科学基金

0+阅读 · 2009年12月31日

Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

Arxiv

0+阅读 · 2022年10月4日

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Arxiv

0+阅读 · 2022年10月3日

Exploiting Selection Bias on Underspecified Tasks in Large Language Models

Arxiv

0+阅读 · 2022年9月30日

A Novel Mixture Model for Characterizing Human Aiming Performance Data

Arxiv

0+阅读 · 2022年9月30日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Arxiv

0+阅读 · 2022年9月30日

Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning

Arxiv

17+阅读 · 2022年2月17日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

VIP会员

文章信息

相关主题

逆强化学习

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

最新《扩散模型原理》新书，470页pdf

无人机作战：演进、创新与未来战场

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月5日

Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments

Arxiv

0+阅读 · 2022年10月4日

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Arxiv

0+阅读 · 2022年10月3日

Exploiting Selection Bias on Underspecified Tasks in Large Language Models

Arxiv

0+阅读 · 2022年9月30日

A Novel Mixture Model for Characterizing Human Aiming Performance Data

Arxiv

0+阅读 · 2022年9月30日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Arxiv

0+阅读 · 2022年9月30日

Learning and Evaluating Graph Neural Network Explanations based on Counterfactual and Factual Reasoning

Arxiv

17+阅读 · 2022年2月17日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

相关基金

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

人可溶性鸟苷酸环化酶介导一氧化氮信号转导的结构基础和调控分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

城市交通出行行为潜变量作用机理及整合模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

拟南芥STRF1蛋白参与植物盐胁迫信号转导途径的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

拓扑超导体的基态和元激发性质

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于荧光有机纳米粒子的双重信号传感研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用单体化合物的碳同位素揭示植物凋落物降解机制

国家自然科学基金

0+阅读 · 2012年12月31日

纳米孔洞型d-f混金属配位聚合物作为重金属离子荧光探针的研究

国家自然科学基金

0+阅读 · 2012年12月31日

手性簇（聚）合物晶体的设计组装与光磁性质研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员