Socially aware robot navigation, where a robot is required to optimize its trajectory to maintain comfortable and compliant spatial interactions with humans in addition to reaching its goal without collisions, is a fundamental yet challenging task in the context of human-robot interaction. While existing learning-based methods have achieved better performance than the preceding model-based ones, they still have drawbacks: reinforcement learning depends on the handcrafted reward that is unlikely to effectively quantify broad social compliance, and can lead to reward exploitation problems; meanwhile, inverse reinforcement learning suffers from the need for expensive human demonstrations. In this paper, we propose a feedback-efficient active preference learning approach, FAPL, that distills human comfort and expectation into a reward model to guide the robot agent to explore latent aspects of social compliance. We further introduce hybrid experience learning to improve the efficiency of human feedback and samples, and evaluate benefits of robot behaviors learned from FAPL through extensive simulation experiments and a user study (N=10) employing a physical robot to navigate with human subjects in real-world scenarios. Source code and experiment videos for this work are available at:https://sites.google.com/view/san-fapl.
翻译:具有社会意识的机器人导航,其中机器人必须优化其轨道,以保持与人类的舒适和兼容的空间互动,并且不发生碰撞,这是人类-机器人互动方面一项根本但具有挑战性的任务。虽然现有的学习方法比以前基于模型的方法取得了较好的绩效,但它们仍然有缺点:强化学习取决于手工制作的奖励,这不可能有效地量化广泛的社会合规情况,并可能导致奖励剥削问题;同时,反向强化学习因需要昂贵的人类演示而受到影响。在本文件中,我们提议采用反馈高效的积极偏好学习方法,即FAPL,将人的舒适和期望注入奖励模式,以指导机器人代理人探索社会合规的潜在方面。我们进一步引入混合经验学习,以提高人类反馈和样本的效率,并评估通过广泛的模拟实验和用户研究(N=10),使用物理机器人在现实世界情景中与人类主题进行导航。这项工作的源代码和实验视频见:https://sites.gogle.com/view/san-fapl。