Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations -- model-based approaches require constraint and parameter tuning to adapt to preferences and new scenarios, while learning-based approaches require reward functions, significant training data, and are hard to adapt to new social scenarios or new domains with limited demonstrations. In this work, we propose Iterative Dimension Informed Program Synthesis (IDIPS) to address these limitations by learning and adapting social navigation in the form of human-readable symbolic programs. IDIPS works by combining program synthesis, parameter optimization, predicate repair, and iterative human demonstration to learn and adapt model-free action selection policies from orders of magnitude less data than learning-based approaches. We introduce a novel predicate repair technique that can accommodate previously unseen social scenarios or preferences by growing existing policies. We present experimental results showing that IDIPS: 1) synthesizes effective policies that model user preference, 2) can adapt existing policies to changing preferences, 3) can extend policies to handle novel social scenarios such as locked doors, and 4) generates policies that can be transferred from simulation to real-world robots with minimal effort.
翻译:机器人社会导航受人类偏好和环境特有情景的影响,例如电梯和门,因此有必要对终端用户进行适应性。最先进的社会导航方法分为两类:以模型为基础的社会制约因素和以学习为基础的方法。这些方法虽然有效,但具有根本性的局限性 -- -- 以模型为基础的方法需要制约和参数调整,以适应偏好和新情景,而以学习为基础的方法则需要奖励功能、大量培训数据,并且很难适应新的社会情景或演示有限的新领域。在这项工作中,我们提议“超常应用信息化方案合成(DIPS)”通过学习和调整以人类可读的象征性方案为形式的社会导航来克服这些局限性。 DIPS通过将方案综合、参数优化、上游修理和反复的人类演示结合起来,学习和调整无模型的行动选择政策,使之适应于比学习为基础的方法少的数量级,而不是以学习为基础的方法。我们引入了一种新的上游修复技术,通过增加现有政策来适应以前看不见的社会情景或偏好。我们提出实验结果表明,IDIPS:1)综合有效的政策,可以将现有政策与改变的偏好,2)将现有政策与改变实际选择相结合,3,可以将机器人改造政策推广到模拟,可以将新的政策扩大到制造。