The ability to predict and plan into the future is fundamental for agents acting in the world. To reach a faraway goal, we predict trajectories at multiple timescales, first devising a coarse plan towards the goal and then gradually filling in details. In contrast, current learning approaches for visual prediction and planning fail on long-horizon tasks as they generate predictions (1) without considering goal information, and (2) at the finest temporal resolution, one step at a time. In this work we propose a framework for visual prediction and planning that is able to overcome both of these limitations. First, we formulate the problem of predicting towards a goal and propose the corresponding class of latent space goal-conditioned predictors (GCPs). GCPs significantly improve planning efficiency by constraining the search space to only those trajectories that reach the goal. Further, we show how GCPs can be naturally formulated as hierarchical models that, given two observations, predict an observation between them, and by recursively subdividing each part of the trajectory generate complete sequences. This divide-and-conquer strategy is effective at long-term prediction, and enables us to design an effective hierarchical planning algorithm that optimizes trajectories in a coarse-to-fine manner. We show that by using both goal-conditioning and hierarchical prediction, GCPs enable us to solve visual planning tasks with much longer horizon than previously possible.
翻译:预测和规划未来的能力对于在世界上行事的代理人来说是根本性的。为了达到一个遥远的目标,我们预测了多个时间尺度的轨迹,首先设计了一个向目标方向的粗略计划,然后逐渐地填入细节。相比之下,当前视觉预测和规划的学习方法在产生预测时失败了长方位任务:(1) 不考虑目标信息, 和(2) 最佳时间分辨率, 一步一步地进行。 在这项工作中, 我们提出了一个视觉预测和规划框架, 能够克服这两种限制。 首先, 我们提出预测目标的问题, 并提出相应的潜伏空间目标长期预测器(GCPs) 。 GCPs 显著地提高了规划效率, 将搜索空间限制在那些达到目标的轨道上。 此外, 我们展示了GCPs如何自然地将GCPs设计成等级模型, 根据两种观察,预测它们之间的观察, 并通过我们反复地分解地对每一部分轨迹产生完整的序列。 这种分化战略在长期的预测中是有效的, 并且使我们能够用一个高层次的预测方式设计出一个有效的Gsalal-al oral oral oral 。