To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects.
翻译:为了成功地应对具有挑战性的操纵任务,自主代理商必须学习多种技能以及如何结合这些技能。最近,自我监督的代理商通过利用在环境中发现的结构制定自己的抽象目标,在很多不同的任务中表现良好。特别是,其中一些用于在组成多对象环境中学习基本操纵技能,然而,这些方法在不考虑物体之间依赖性的情况下学习技能。因此,在现实环境中,学到的技能很难结合起来。我们提议了一个新的自我监督的代理商来估计环境组成部分之间的关系并利用它们独立控制环境状态的不同部分。此外,天体之间的估计关系可以用来将一个复杂目标分解成一个相容的子目标序列。我们表明,利用这一框架,代理人可以高效和自动地学习在物体之间不同关系的多对象环境中的操纵任务。