Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences. A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. While this minimizes the users effort, a strict choice does not yield any information on how much one trajectory is preferred. We propose scale feedback, where the user utilizes a slider to give more nuanced information. We introduce a probabilistic model on how users would provide feedback and derive a learning framework for the robot. We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies suggesting that scale feedback enables more effective learning in practice.
翻译:今天的机器人越来越多地与人互动,需要有效地学习缺乏经验的用户偏好。 一个共同的框架是反复询问用户他们喜欢的两种演示机器人轨迹中的哪一个。 虽然这最大限度地减少了用户的努力, 严格的选择并不能产生任何关于选择一个轨迹的信息。 我们提出比例反馈, 用户使用滑块提供更细微的信息。 我们引入一个概率模型, 说明用户如何提供反馈, 为机器人获取一个学习框架。 我们在模拟中展示滑行反馈的性能效益, 并在两项用户研究中验证我们的方法, 表明规模反馈有助于更有效地在实践中学习 。