Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of configurations of entities and their locations. Worse, the representations of these entities are unknown and must be inferred from sensory percepts. We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured visual inputs. By constructing a factorized transition graph over clusters of entity representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on simulated rearrangement tasks.
翻译:物体重新排列对于具有体验的机器人而言是个挑战,因为解决这些任务需要跨越一个组合巨大的实体配置和位置集合进行泛化。更糟糕的是,这些实体的表示是未知的,必须从感官感知中推断出来。我们提出了一种分层抽象方法来发现这些潜在实体,从非结构化的视觉输入中实现组合泛化。通过在从像素预测中推断出的实体集群之上构建一个因式化的转移图,我们展示了如何学习代理模型中对实体状态的干预和环境中操作对象之间的对应关系。我们利用这种对应关系开发了一种控制方法,该方法在不同数量和配置的对象上进行泛化,在模拟重新排列任务上进行评估时优于当前的离线深度强化学习方法。