Constructing a diverse repertoire of manipulation skills in a scalable fashion remains an unsolved challenge in robotics. One way to address this challenge is with unstructured human play, where humans operate freely in an environment to reach unspecified goals. Play is a simple and cheap method for collecting diverse user demonstrations with broad state and goal coverage over an environment. Due to this diverse coverage, existing approaches for learning from play are more robust to online policy deviations from the offline data distribution. However, these methods often struggle to learn under scene variation and on challenging manipulation primitives, due in part to improperly associating complex behaviors to the scene changes they induce. Our insight is that an object-centric view of play data can help link human behaviors and the resulting changes in the environment, and thus improve multi-task policy learning. In this work, we construct a latent space to model object \textit{affordances} -- properties of an object that define its uses -- in the environment, and then learn a policy to achieve the desired affordances. By modeling and predicting the desired affordance across variable horizon tasks, our method, Predicting Latent Affordances Through Object-Centric Play (PLATO), outperforms existing methods on complex manipulation tasks in both 2D and 3D object manipulation simulation and real world environments for diverse types of interactions. Videos can be found on our website: https://tinyurl.com/4u23hwfv
翻译:以可扩缩的方式构建多样化的操纵技能组合仍然是机器人中尚未解决的挑战。 应对这一挑战的方法之一是无结构的人类游戏, 人类在环境中自由运作, 以达到未具体说明的目标。 游戏是一种简单而廉价的方法, 收集各种用户演示, 覆盖环境, 覆盖环境, 并且是一个广度和目标的多功能政策学习。 由于覆盖范围不同, 从游戏中学习的现有方法更加强大, 从离线数据分布到在线政策偏差。 然而, 这些方法往往难以在现场变异和具有挑战性的操纵原始中学习, 部分原因是将复杂行为与他们引发的场景变化不适当地联系起来。 我们的洞察力是, 以物体为中心的游戏数据视图可以帮助将人类的行为与环境的变化联系起来, 从而改进多功能的政策学习。 在这项工作中, 我们建造了一个模拟对象\ textitleitle{affordance} 的潜在空间, 用于定义其用途的对象在环境中的属性, 然后学习一个实现理想的视频支付能力的政策。 通过在可变地平面任务中建模和预测所期望的目标, 我们的方法, 快速地平面操作 。