Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives. How to \textit{specify} and \textit{ground} these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in the space of noisy and high-dimensional sensory inputs poses a challenge for training goal-conditioned agents, or even for generalization to novel goals. We propose to address this by learning factorial representations of goals and processing the resulting representation via a discretization bottleneck, for coarser goal specification, through an approach we call DGRL. We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups, by experimentally evaluating this method on tasks ranging from maze environments to complex robotic navigation and manipulation. Additionally, we prove a theorem lower-bounding the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive combinatorial structure.
翻译:目标附加的强化学习(RL)是培训人员解决多重任务并达到一系列不同目标的一个很有希望的方向。 如何在培训期间可靠地达到目标,以及在评估期间推广新的目标,从而在培训过程中可靠地达到目标,仍然是开放的研究领域。 确定噪音和高维感应投入空间中的目标,对培训目标限定的代理人员,甚至对将新目标概括化来说,都是一种挑战。 我们提议通过学习目标的因子表达和通过离散的瓶子处理由此产生的代表,以达到更粗略的目标规格。 我们表明,采用离散的瓶子可以改进有目标的RL设置的性能,通过实验性地评估从迷宫环境到复杂的机器人导航和操控等任务的方法。 此外,我们证明,在分配目标外的预期回报方面,从理论上看是较低的,同时仍然允许用明确的组合结构来说明目标。