演示增强型自适应多目标机器人导航 (Demonstration-Enhanced Adaptable Multi-Objective Robot Navigation)

Preference-aligned robot navigation in human environments is typically achieved through learning-based approaches, utilizing user feedback or demonstrations for personalization. However, personal preferences are subject to change and might even be context-dependent. Yet traditional reinforcement learning (RL) approaches with static reward functions often fall short in adapting to evolving user preferences, inevitably reflecting demonstrations once training is completed. This paper introduces a structured framework that combines demonstration-based learning with multi-objective reinforcement learning (MORL). To ensure real-world applicability, our approach allows for dynamic adaptation of the robot navigation policy to changing user preferences without retraining. It fluently modulates the amount of demonstration data reflection and other preference-related objectives. Through rigorous evaluations, including a baseline comparison and sim-to-real transfer on two robots, we demonstrate our framework's capability to adapt to user preferences accurately while achieving high navigational performance in terms of collision avoidance and goal pursuance.

翻译：在人类环境中实现偏好对齐的机器人导航通常通过基于学习的方法实现，利用用户反馈或演示进行个性化定制。然而，个人偏好可能发生变化，甚至可能具有情境依赖性。而采用静态奖励函数的传统强化学习（RL）方法往往难以适应不断变化的用户偏好，一旦训练完成便不可避免地固化演示行为。本文提出了一种结构化框架，将基于演示的学习与多目标强化学习（MORL）相结合。为确保实际适用性，我们的方法允许机器人导航策略动态适应用户偏好的变化而无需重新训练，能够流畅调节演示数据反映程度及其他偏好相关目标。通过包括基线对比和两台机器人的仿真到现实迁移在内的严格评估，我们证明了该框架在准确适应用户偏好的同时，能在避障和目标任务达成方面实现卓越的导航性能。