在Markov决策过程中,将州级仲裁导致的人类处决错误账户政策综合起来 (Synthesizing Policies That Account For Human Execution Errors Caused By StateAliasing In Markov Decision Processes) - 专知论文

会员服务 ·

0

Processing（编程语言） · 似然 · 可辨认的 · Performer · MoDELS ·

2021 年 9 月 15 日

Synthesizing Policies That Account For Human Execution Errors Caused By StateAliasing In Markov Decision Processes

翻译：在Markov决策过程中,将州级仲裁导致的人类处决错误账户政策综合起来

Sriram Gopalakrishnan,Mudit Verma,Subbarao Kambhampati

from arxiv, 7 page paper, 6 pages supplemental material

When humans are given a policy to execute, there can be pol-icy execution errors and deviations in execution if there is un-certainty in identifying a state. So an algorithm that computesa policy for a human to execute ought to consider these effectsin its computations. An optimal MDP policy that is poorly ex-ecuted (because of a human agent) maybe much worse thananother policy that is executed with fewer errors. In this pa-per, we consider the problems of erroneous execution and ex-ecution delay when computing policies for a human agent thatwould act in a setting modeled by a Markov Decision Process(MDP). We present a framework to model the likelihood ofpolicy execution errors and likelihood of non-policy actionslike inaction (delays) due to state uncertainty. This is followedby a hill climbing algorithm to search for good policies thataccount for these errors. We then use the best policy found byhill climbing with a branch and bound algorithm to find theoptimal policy. We show experimental results in a Gridworlddomain and analyze the performance of the two algorithms.We also present human studies that verify if our assumptionson policy execution by humans under state-aliasing are rea-sonable.

翻译：当人类被赋予执行政策时,如果在确定状态时不确定,执行中可能会出现极差的错误和偏差。因此,计算一个人执行的政策的算法应该在其计算中考虑这些效应。最优的 MDP 政策如果执行不力(因为人为代理),可能比执行错误较少的其他政策要差得多。在这个pa-per中,我们在计算一个以Markov 决策程序(MDP)为模型的设定为模型的人类代理人的政策时会考虑错误的执行和执行拖延问题。我们提出了一个框架,用以模拟政策执行错误的可能性和由于不确定性而采取非政策性行动的可能性。这是由山坡爬算法所遵循的,以寻找出这些错误的好政策。我们然后用一个分支和捆绑的算法来利用山坡爬所发现的最佳政策来找到最佳的政策。我们在一个Gridworddomain 中展示实验结果,并分析两种算法的性。我们还可以进行人类研究,如果我们的假设是用人类的状态执行的话,那么可以核查我们的假设。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

【PKDD2021】成对偏好学习，109页ppt，Pairwise Preference Learning

专知会员服务

21+阅读 · 2021年6月10日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

ERROR: GLEW initalization error: Missing GL version

ERROR: GLEW initalization error: Missing GL version

深度强化学习实验室

9+阅读 · 2018年6月13日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Regular Decision Processes for Grid Worlds

Regular Decision Processes for Grid Worlds

Arxiv

0+阅读 · 2021年11月5日

Dynamic Human-Robot Role Allocation based on Human Ergonomics Risk Prediction and Robot Actions Adaptation

Arxiv

0+阅读 · 2021年11月5日

Bandits with many optimal arms

Arxiv

0+阅读 · 2021年11月5日

The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems

Arxiv

0+阅读 · 2021年11月5日

Causal inference with imperfect instrumental variables

Arxiv

0+阅读 · 2021年11月4日

Safety-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

Safety-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

Arxiv

0+阅读 · 2021年11月4日

Asymptotic Analysis for Data-Driven Inventory Policies

Arxiv

0+阅读 · 2021年11月4日

Learning and Planning in Complex Action Spaces

Arxiv

4+阅读 · 2021年4月13日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【PKDD2021】成对偏好学习，109页ppt，Pairwise Preference Learning

专知会员服务

21+阅读 · 2021年6月10日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACL2025教程】大语言模型的护栏与安全性：对其应用的安全、可靠与可控引导

《实现协同自主：从人机协作到多智能体系统》最新190页

【ICML2025】SToFM：一种用于空间转录组学的多尺度基础模型

通信网络智能体白皮书V1.0，61页pdf

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

ERROR: GLEW initalization error: Missing GL version

ERROR: GLEW initalization error: Missing GL version

深度强化学习实验室

9+阅读 · 2018年6月13日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Regular Decision Processes for Grid Worlds

Regular Decision Processes for Grid Worlds

Arxiv

0+阅读 · 2021年11月5日

Dynamic Human-Robot Role Allocation based on Human Ergonomics Risk Prediction and Robot Actions Adaptation

Arxiv

0+阅读 · 2021年11月5日

Bandits with many optimal arms

Arxiv

0+阅读 · 2021年11月5日

The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems

Arxiv

0+阅读 · 2021年11月5日

Causal inference with imperfect instrumental variables

Arxiv

0+阅读 · 2021年11月4日

Safety-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

Safety-Critical Modular Deep Reinforcement Learning with Temporal Logic through Gaussian Processes and Control Barrier Functions

Arxiv

0+阅读 · 2021年11月4日

Asymptotic Analysis for Data-Driven Inventory Policies

Arxiv

0+阅读 · 2021年11月4日

Learning and Planning in Complex Action Spaces

Arxiv

4+阅读 · 2021年4月13日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Reward learning from human preferences and demonstrations in Atari

Arxiv

8+阅读 · 2018年11月15日

微信扫码咨询专知VIP会员