Synchronizing expectations and knowledge about the state of the world is an essential capability for effective collaboration. For robots to effectively collaborate with humans and other autonomous agents, it is critical that they be able to generate intelligible explanations to reconcile differences between their understanding of the world and that of their collaborators. In this work we present Single-shot Policy Explanation for Augmenting Rewards (SPEAR), a novel sequential optimization algorithm that uses semantic explanations derived from combinations of planning predicates to augment agents' reward functions, driving their policies to exhibit more optimal behavior. We provide an experimental validation of our algorithm's policy manipulation capabilities in two practically grounded applications and conclude with a performance analysis of SPEAR on domains of increasingly complex state space and predicate counts. We demonstrate that our method makes substantial improvements over the state-of-the-art in terms of runtime and addressable problem size, enabling an agent to leverage its own expertise to communicate actionable information to improve another's performance.
翻译:合成关于世界状况的期望和知识是有效合作的基本能力。对于机器人来说,有效与人类和其他自主代理人合作的关键是,他们必须能够提出明白的解释,以调和他们对世界的理解与其合作者的理解之间的差异。在这项工作中,我们提出了“提高奖励单发政策解释”(SPEAR),这是一种新型的顺序优化算法,它利用规划假想组合得出的语义解释来增强代理人的奖赏功能,促使其政策展现出更优化的行为。我们用两种实际的应用程序对我们的算法政策操纵能力进行了实验性验证,并以SPEEAR对日益复杂的状态空间和上游计数领域的业绩分析为结论。我们证明,我们的方法在运行时间和可解决问题的规模方面比最先进的方法有很大改进,使代理人能够利用其自己的专长来传播可采取行动的信息来改进另一个人的绩效。