Despite significant advancements in the field of multi-agent navigation, agents still lack the sophistication and intelligence that humans exhibit in multi-agent settings. In this paper, we propose a framework for learning a human-like general collision avoidance policy for agent-agent interactions in fully decentralized, multi-agent environments. Our approach uses knowledge distillation with reinforcement learning to shape the reward function based on expert policies extracted from human trajectory demonstrations through behavior cloning. We show that agents trained with our approach can take human-like trajectories in collision avoidance and goal-directed steering tasks not provided by the demonstrations, outperforming the experts as well as learning-based agents trained without knowledge distillation.
翻译:尽管在多试剂导航领域取得了显著进展,但代理人仍然缺乏人类在多试剂环境下展示的先进性和智慧。在本文件中,我们提议了一个框架,用于在完全分散的多试剂环境中,为代理人与代理人的互动学习一种人性一般避免碰撞政策。我们的方法是利用知识蒸馏和强化学习来根据通过行为克隆从人类轨迹演示中提取的专家政策塑造奖励功能。我们表明,受过我们方法培训的代理人可以在避免碰撞和不由示威提供的定向指导任务方面采取人性化轨迹,优于专家以及没有经过知识蒸馏培训的以学习为基础的代理人。