The ability to plan actions on multiple levels of abstraction enables intelligent agents to solve complex tasks effectively. However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art. Because of its ability to plan, our algorithm can find better trajectories than the ones in the training set
翻译:在多重抽象层次上规划行动的能力使智能剂能够有效地解决复杂的任务。然而,从演示中学习低层次和高层规划模式证明是具有挑战性的,特别是高层次投入。为了解决这一问题,我们提议利用强化学习来确定专家轨道中的次级目标,将奖励的幅度与低层次行动的可预测性联系起来,同时考虑到状态和所选择的次级目标。我们为已确定的次级目标建立一个矢量定量的基因化模型,以进行次级目标一级的规划。在实验中,算法在解决复杂、长视距决策问题方面优于最先进的水平。由于其规划能力,我们的算法可以找到比培训成套目标更好的轨道。