在Markov决策过程中将国家信任导致的人执行错误账户政策合成 (Synthesizing Policies That Account For Human Execution Errors Caused By State-Aliasing In Markov Decision Processes) - 专知论文

会员服务 ·

0

Processing（编程语言） · 似然 · 优化器 · 可辨认的 · Performer ·

2021 年 9 月 20 日

Synthesizing Policies That Account For Human Execution Errors Caused By State-Aliasing In Markov Decision Processes

翻译：在Markov决策过程中将国家信任导致的人执行错误账户政策合成

Sriram Gopalakrishnan,Mudit Verma,Subbarao Kambhampati

from arxiv, 7 page paper, 6 pages supplemental material

When humans are given a policy to execute, there can be policy execution errors and deviations in execution if there is uncertainty in identifying a state. So an algorithm that computes a policy for a human to execute ought to consider these effects in its computations. An optimal MDP policy that is poorly executed (because of a human agent) maybe much worse than another policy that is executed with fewer errors. In this paper, we consider the problems of erroneous execution and execution delay when computing policies for a human agent that would act in a setting modeled by a Markov Decision Process. We present a framework to model the likelihood of policy execution errors and likelihood of non-policy actions like inaction (delays) due to state uncertainty. This is followed by a hill climbing algorithm to search for good policies that account for these errors. We then use the best policy found by hill climbing with a branch and bound algorithm to find the optimal policy. We show experimental results in a Gridworld domain and analyze the performance of the two algorithms. We also present human studies that verify if our assumptions on policy execution by humans under state-aliasing are reasonable.

翻译：当人类被赋予执行政策时,如果在确定状态方面存在不确定性,执行中可能会出现政策执行错误和偏差。因此计算一个人执行的政策的算法应该在其计算中考虑到这些效应。最优的 MDP 政策执行不力(因为人为代理)可能比另一种政策差得多。在本文中,我们考虑了在根据Markov 决策程序模型设定的环境下操作的人类代理人计算政策时错误执行和执行延迟的问题。我们提出了一个框架,以模拟政策执行错误的可能性和由于国家不确定性而采取不作为(拖延)等非政策行动的可能性。接下来是山地攀爬算法,以寻找说明这些错误的好政策。然后我们用一个分支和捆绑算法的山坡爬发现的最佳政策找到最佳政策。我们用Gridworld域显示实验结果,分析两种算法的性能。我们还提出人类研究,以核实我们对受国家指责的人类政策执行的假设是否合理。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

Arxiv

0+阅读 · 2021年11月12日

A Bandit Model for Human-Machine Decision Making with Private Information and Opacity

Arxiv

0+阅读 · 2021年11月11日

Reinforcement Learning for Robotic Manipulation using Simulated Locomotion Demonstrations

Arxiv

0+阅读 · 2021年11月11日

Arxiv

0+阅读 · 2021年11月11日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

At Human Speed: Deep Reinforcement Learning with Action Delay

Arxiv

4+阅读 · 2018年10月16日

Learning under Misspecified Objective Spaces

Arxiv

3+阅读 · 2018年10月11日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

兵棋系统文档：联合战区级模拟-全球行动（JTLS-GO®）

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

从无人机到数据：揭示边缘计算作为新作战域

综述：机器嗅觉与嵌入式人工智能正在塑造新的全球传感产业

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

Arxiv

0+阅读 · 2021年11月12日

A Bandit Model for Human-Machine Decision Making with Private Information and Opacity

Arxiv

0+阅读 · 2021年11月11日

Reinforcement Learning for Robotic Manipulation using Simulated Locomotion Demonstrations

Arxiv

0+阅读 · 2021年11月11日

Arxiv

0+阅读 · 2021年11月11日

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Arxiv

6+阅读 · 2019年4月3日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

At Human Speed: Deep Reinforcement Learning with Action Delay

Arxiv

4+阅读 · 2018年10月16日

Learning under Misspecified Objective Spaces

Arxiv

3+阅读 · 2018年10月11日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

微信扫码咨询专知VIP会员