Seamlessly interacting with humans or robots is hard because these agents are non-stationary. They update their policy in response to the ego agent's behavior, and the ego agent must anticipate these changes to co-adapt. Inspired by humans, we recognize that robots do not need to explicitly model every low-level action another agent will make; instead, we can capture the latent strategy of other agents through high-level representations. We propose a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy. The ego agent then leverages these latent dynamics to influence the other agent, purposely guiding them towards policies suitable for co-adaptation. Across several simulated domains and a real-world air hockey game, our approach outperforms the alternatives and learns to influence the other agent.
翻译:与人类或机器人的无缝互动是困难的,因为这些代理商是非静止的。它们更新了自己的政策,以适应自我代理商的行为,而自我代理商必须预测这些变化会共同适应。在人类的启发下,我们认识到机器人不需要明确模拟另一个代理商将采取的每一个低层次行动;相反,我们可以通过高层代表来捕捉其他代理商的潜在战略。我们建议了一个强化学习框架,以学习代理人政策的潜在表现,让自我代理商确定自身行为与另一个代理商未来战略之间的关系。自我代理商然后利用这些潜伏动态来影响其他代理商,目的性地指导他们制定适合共同适应的政策。在几个模拟领域和真实世界的冰球游戏中,我们的方法超越了替代方,并学会影响其他代理商。