This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections -- corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot's current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks.
翻译:本文提出一种新的方法,使机器人能够从人为方向的校正中逐步学习客观功能。 现有方法从人类规模的校正中学习; 由于人类需要仔细选择每项校正的规模,这些方法很容易导致纠正过度和学习效率低下。 拟议的方法仅需要人为方向的校正 -- -- 这些校正仅需要人为方向的校正 -- -- 这些校正仅表明输入变化的方向,而没有说明其规模。 我们仅假设,每个校正,不管其规模大小,都指向一个方向,可以使机器人的当前运动相对于未知的目标功能得到改善。 允许校正满足这一假设的占了投入空间的一半, 而不是在缩小的一套校正。 对于每一项方向校正,拟议方法更新了基于剪切平法的客观功能的估计数,该方法有几何解释。 我们建立了理论结果,以显示学习过程的趋同性。 我们仅用数字示例、两个人类机器人游戏的用户研究以及一个真实世界的夸德罗托试验测试了。 其结果证实了拟议方法的趋同性,并进一步表明,该方法比早期的学习框架(高成功率)更需要(高的机率)更低。