Physical interactions can often help reveal information that is not readily apparent. For example, we may tug at a table leg to evaluate whether it is built well, or turn a water bottle upside down to check that it is watertight. We propose to train robots to acquire such interactive behaviors automatically, for the purpose of evaluating the result of an attempted robotic skill execution. These evaluations in turn serve as "interactive reward functions" (IRFs) for training reinforcement learning policies to perform the target skill, such as screwing the table leg tightly. In addition, even after task policies are fully trained, IRFs can serve as verification mechanisms that improve online task execution. For any given task, our IRFs can be conveniently trained using only examples of successful outcomes, and no further specification is needed to train the task policy thereafter. In our evaluations on door locking and weighted block stacking in simulation, and screw tightening on a real robot, IRFs enable large performance improvements, even outperforming baselines with access to demonstrations or carefully engineered rewards. Project website: https://sites.google.com/view/lirf-corl-2022/
翻译:物理互动往往有助于揭示不那么明显的信息。 例如,我们可以在桌腿上拖拉,以评价它是否建好,或者把水瓶翻倒以检查它是否水密。 我们提议培训机器人自动获得这种互动行为,以评价尝试机器人技能执行的结果。 这些评估反过来又起到“互动奖励功能”的作用,用于培训强化学习政策,以完成目标技能,例如紧紧地把桌腿搞乱。 此外,即使在任务政策经过充分培训之后,综合成果框架也可以作为改进在线任务执行的核查机制。 对于任何特定任务,我们的综合成果框架都可以仅仅利用成功结果的实例进行方便的培训,以后无需进一步说明来培训任务政策。 在我们关于模拟中门锁和加权块堆叠的评估工作中,并停止对真正的机器人的收紧,综合成果框架能够大大改进业绩,甚至超过有演示或仔细设计奖赏的成绩基线。 项目网站:https://sitesites.gogle.com/view/cirf-2022/