Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems. Traditionally in goal-conditioned RL, an agent is provided with the exact goal they intend to reach. However, it is often not realistic to know the configuration of the goal before performing a task. A more scalable framework would allow us to provide the agent with an example of an analogous task, and have the agent then infer what the goal should be for its current state. We propose a new form of state abstraction called goal-conditioned bisimulation that captures functional equivariance, allowing for the reuse of skills to achieve new goals. We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks. Further, we prove that this learned representation is sufficient not only for goal conditioned tasks, but is amenable to any downstream task described by a state-only reward function. Videos can be found at https://sites.google.com/view/gc-bisimulation.
翻译:从丰富的观测中建立可普遍适用的、有目标条件的代理人是加强学习(RL)解决现实世界问题的关键。传统上,在有目标条件的RL中,向代理人提供他们想要达到的确切目标。然而,在执行任务之前了解目标的配置往往不切实际。一个更可扩展的框架将使我们能够向代理人提供类似任务的一个实例,然后让代理人推算目标应该针对其当前状态。我们提议一种新的国家抽象形式,称为有目标条件的调整,以捕捉功能等同,允许技能的再利用以达到新目标。我们用这种抽象形式来学习这种表述,并展示其在模拟操作任务中推广新目标的能力。此外,我们证明这种学到的表述不仅足以满足目标条件的任务,而且能够适应由国家奖励功能描述的任何下游任务。视频可以在 https://sites.google.com/view/gc-bisimmulation上找到。