身体互动作为沟通:学习人类教养网上的机器人目标 (Physical Interaction as Communication: Learning Robot Objectives Online from Human Corrections)

When a robot performs a task next to a human, physical interaction is inevitable: the human might push, pull, twist, or guide the robot. The state-of-the-art treats these interactions as disturbances that the robot should reject or avoid. At best, these robots respond safely while the human interacts; but after the human lets go, these robots simply return to their original behavior. We recognize that physical human-robot interaction (pHRI) is often intentional -- the human intervenes on purpose because the robot is not doing the task correctly. In this paper, we argue that when pHRI is intentional it is also informative: the robot can leverage interactions to learn how it should complete the rest of its current task even after the person lets go. We formalize pHRI as a dynamical system, where the human has in mind an objective function they want the robot to optimize, but the robot does not get direct access to the parameters of this objective -- they are internal to the human. Within our proposed framework human interactions become observations about the true objective. We introduce approximations to learn from and respond to pHRI in real-time. We recognize that not all human corrections are perfect: often users interact with the robot noisily, and so we improve the efficiency of robot learning from pHRI by reducing unintended learning. Finally, we conduct simulations and user studies on a robotic manipulator to compare our proposed approach to the state-of-the-art. Our results indicate that learning from pHRI leads to better task performance and improved human satisfaction.

翻译：当机器人在人类旁边执行一项任务时,物理互动是不可避免的:人类可能会推动、拉动、扭曲或引导机器人。最先进的机器人将这些互动视为机器人应该拒绝或避免的扰动。最理想的情况是,这些机器人在人类互动的同时做出安全反应;但是,在人类放手后,这些机器人只是回到其原始行为。我们认识到,人体-机器人的物理-机器人互动(pHRI)往往是有意的 -- 人类因为机器人没有正确完成任务而故意干预。在本文中,我们争辩说,当PHRI是故意的时,它也是知情的:机器人可以利用这些互动来学习它应该如何完成目前的任务的其余部分。我们把PHRI正式化为一个动态系统,在人类心怀着一个目标功能的情况下,他们只是返回到他们自己的原始行为。我们认识到,人类的物理-机器人互动是人类内部的。在我们提议的框架中,人类互动成为关于真实目标的观察结果的观察结果。我们引入了近似来学习和响应PHRI的实时。我们认识到,人类的性能如何完成它。我们并不完全地学习人类的机器人的机能效率,我们从我们的机器人的学习我们是如何去。我们从机器人的机变的机能学。最后学习。我们不完美的机能,我们从我们的机变的机能的机能的机能学,我们经常学习。我们从我们的机能学。我们从我们的机能的机能学。我们从我们的机能学。我们学习了。我们学习。我们学习了。我们不完美的机的机能和机能的机能的机能学。我们学习了。最后的机能学。