Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (\textbf{RE}quest help and \textbf{MOVE} on), which uses language-based feedback to adjust trained policies to real-time changes in the environment. In this work, we enable the trained policy to decide \emph{when to ask for feedback} and \emph{how to incorporate feedback into trained policies}. RE-MOVE incorporates epistemic uncertainty to determine the optimal time to request feedback from humans and uses language-based feedback for real-time adaptation. We perform extensive synthetic and real-world evaluations to demonstrate the benefits of our proposed approach in several test-time dynamic navigation scenarios. Our approach enable robots to learn from human feedback and adapt to previously unseen adversarial situations.
翻译:持续控制机器人导航任务的强化学习政策往往无法适应实时部署期间环境的变化,这可能导致灾难性的失败。为了应对这一限制,我们提议采用名为 RE-MOVE(\ textbf{RE}quest help and\ textbf{MOVE})的新颖方法,即使用语言反馈调整经过培训的政策以适应环境的实时变化。在这项工作中,我们使经过培训的政策能够决定 emph{当要求反馈时 和 emph{h} 如何将反馈纳入经过培训的政策 。 RE-MOVE 包含了一些隐含的不确定性,以确定要求人类反馈的最佳时间,并使用基于语言的反馈进行实时适应。我们进行了广泛的合成和实时评估,以展示我们在若干试验时动态导航情景中拟议方法的好处。我们的方法使机器人能够学习人类反馈,并适应以往的对抗性环境。</s>