In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several simulated scenarios of different complexity. We also showcase the use of our legible policies as demonstrations for an inverse reinforcement learning agent, establishing their superiority against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions.
翻译:在本文中,我们调查了在不确定情况下连续决策任务中的可辨识性概念。 以往的工作将可辨识性扩大到机器人运动以外的情景,要么侧重于确定性设置,要么过于昂贵的计算费用。 我们的拟议方法,即称为Pol-MDP,能够处理不确定性,同时又在计算上保持可移性。 我们在一些复杂程度不同的模拟假设中,确定了我们与最先进方法相对的优势。 我们还展示了我们可辨识性政策的使用,作为反强化学习剂的示范,确定了他们相对于基于最佳政策的常用演示的优越性。 最后,我们通过用户研究评估了我们计算的政策的可辨识性,在用户研究中,人们被要求通过观察移动机器人的可辨识性政策来推断其目标。