Learning robot objective functions from human input has become increasingly important, but state-of-the-art techniques assume that the human's desired objective lies within the robot's hypothesis space. When this is not true, even methods that keep track of uncertainty over the objective fail because they reason about which hypothesis might be correct, and not whether any of the hypotheses are correct. We focus specifically on learning from physical human corrections during the robot's task execution, where not having a rich enough hypothesis space leads to the robot updating its objective in ways that the person did not actually intend. We observe that such corrections appear irrelevant to the robot, because they are not the best way of achieving any of the candidate objectives. Instead of naively trusting and learning from every human interaction, we propose robots learn conservatively by reasoning in real time about how relevant the human's correction is for the robot's hypothesis space. We test our inference method in an experiment with human interaction data, and demonstrate that this alleviates unintended learning in an in-person user study with a 7DoF robot manipulator.
翻译:从人类输入中学习机器人客观功能已变得越来越重要,但最先进的技术假设人类所期望的目标在机器人的假设空间之内。 如果这是不真实的,即使追踪目标不确定性的方法也失败了,因为它们解释了哪些假设可能正确,而不是任何假设是否正确。我们特别侧重于在机器人任务执行期间从人的身体校正中学习,因为在这种情况下,没有足够的假设空间导致机器人以某人实际没有意图的方式更新其目标。我们发现,这种校正似乎与机器人无关,因为它们不是实现任何候选目标的最佳方法。我们建议机器人不要天真地信任和学习每一次人类互动,而是通过实时推理人类校正对机器人假说空间的重要性来保守地学习。我们在人类互动数据实验中测试我们的推论方法,并证明这减轻了与7DoF机器人机器人操作者进行的人际用户研究中无意学习的难度。