Constructing a diverse repertoire of manipulation skills in a scalable fashion remains an unsolved challenge in robotics. One way to address this challenge is with unstructured human play, where humans operate freely in an environment to reach unspecified goals. Play is a simple and cheap method for collecting diverse user demonstrations with broad state and goal coverage over an environment. Due to this diverse coverage, existing approaches for learning from play are more robust to online policy deviations from the offline data distribution. However, these methods often struggle to learn under scene variation and on challenging manipulation primitives, due in part to improperly associating complex behaviors to the scene changes they induce. Our insight is that an object-centric view of play data can help link human behaviors and the resulting changes in the environment, and thus improve multi-task policy learning. In this work, we construct a latent space to model object affordances -- properties of an object that define its uses -- in the environment, and then learn a policy to achieve the desired affordances. By modeling and predicting the desired affordance across variable horizon tasks, our method, Predicting Latent Affordances Through Object-Centric Play (PLATO), outperforms existing methods on complex manipulation tasks in both 2D and 3D object manipulation simulation and real world environments for diverse types of interactions. Videos can be found on our website: https://tinyurl.com/4u23hwfv
翻译:以可扩缩的方式构建多样化的操纵技能组合,仍然是机器人中尚未解决的挑战。 应对这一挑战的一个方法就是无结构的人类游戏, 人类在环境中自由运作, 以达到未具体说明的目标。 游戏是一种简单而廉价的方法, 收集各种用户演示, 覆盖环境, 覆盖环境的广度和目标范围。 由于覆盖范围不同, 从游戏中学习的现有方法更强大, 与在线政策偏离离线数据分布。 然而, 这些方法往往难以在现场变异和具有挑战性的操纵原始技术中学习, 部分原因是将复杂行为与他们引发的场景变化不适当地联系起来。 我们的洞察力是, 以物体为中心的游戏数据视图视图可以帮助将人类行为与环境的变化联系起来, 从而改进多任务的政策学习。 在这项工作中, 我们建造了一个潜在的空间, 模型支付能力 -- -- 定义其用途的对象的属性, 环境, 然后学习一项政策, 实现理想的支付能力。 通过模拟和预测理想的可变地平线任务、 我们的方法、 预测后端视频操作 Adfreal- diffortical laction Afliflical- laction