When humans are given a policy to execute, there can be policy execution errors and deviations in execution if there is uncertainty in identifying a state. So an algorithm that computes a policy for a human to execute ought to consider these effects in its computations. An optimal MDP policy that is poorly executed (because of a human agent) maybe much worse than another policy that is executed with fewer errors. In this paper, we consider the problems of erroneous execution and execution delay when computing policies for a human agent that would act in a setting modeled by a Markov Decision Process. We present a framework to model the likelihood of policy execution errors and likelihood of non-policy actions like inaction (delays) due to state uncertainty. This is followed by a hill climbing algorithm to search for good policies that account for these errors. We then use the best policy found by hill climbing with a branch and bound algorithm to find the optimal policy. We show experimental results in a Gridworld domain and analyze the performance of the two algorithms. We also present human studies that verify if our assumptions on policy execution by humans under state-aliasing are reasonable.
翻译:当人类被赋予执行政策时,如果在确定状态方面存在不确定性,执行中可能会出现政策执行错误和偏差。因此计算一个人执行的政策的算法应该在其计算中考虑到这些效应。最优的 MDP 政策执行不力(因为人为代理)可能比另一种政策差得多。在本文中,我们考虑了在根据Markov 决策程序模型设定的环境下操作的人类代理人计算政策时错误执行和执行延迟的问题。我们提出了一个框架,以模拟政策执行错误的可能性和由于国家不确定性而采取不作为(拖延)等非政策行动的可能性。接下来是山地攀爬算法,以寻找说明这些错误的好政策。然后我们用一个分支和捆绑算法的山坡爬发现的最佳政策找到最佳政策。我们用Gridworld域显示实验结果,分析两种算法的性能。我们还提出人类研究,以核实我们对受国家指责的人类政策执行的假设是否合理。