Deeply-learned planning methods are often based on learning representations that are optimized for unrelated tasks. For example, they might be trained on reconstructing the environment. These representations are then combined with predictor functions for simulating rollouts to navigate the environment. We find this principle of learning representations unsatisfying and propose to learn them such that they are directly optimized for the task at hand: to be maximally predictable for the predictor function. This results in representations that are by design optimal for the downstream task of planning, where the learned predictor function is used as a forward model. To this end, we propose a new way of jointly learning this representation along with the prediction function, a system we dub Latent Representation Prediction Network (LARP). The prediction function is used as a forward model for search on a graph in a viewpoint-matching task and the representation learned to maximize predictability is found to outperform a pre-trained representation. Our approach is shown to be more sample-efficient than standard reinforcement learning methods and our learned representation transfers successfully to dissimilar objects.
翻译:深层学习的规划方法往往基于学习的表述方式,这些表述方式对于不相干的任务来说是最佳的。例如,它们可能会在重建环境方面接受培训。然后,这些表述方式与模拟推出环境的预测功能相结合。我们发现这种学习表达方式的原则不令人满意,并提议学习它们,以便直接优化手头的任务:对预测员的功能来说,这是最大的可预测性。这种表述方式对下游的规划任务来说是最佳的,在这方面,所学的预测器功能被用作前瞻模式。为此,我们提出了一种与预测功能一起共同学习这种表达方式的新方法,即我们用“Ung LERP” 预测网络(LARP)系统(LARP)。预测功能被作为一种前瞻模型,用于在视觉匹配任务中搜索图表,而所学到的最大限度的可预测性代表方式比经过培训前的表述方式要优于标准强化学习方法,而且我们所学过的代表方式成功地转移到不同对象。