Humans are skillful navigators: We aptly maneuver through new places, realize when we are back at a location we have seen before, and can even conceive of shortcuts that go through parts of our environments we have never visited. Current methods in model-based reinforcement learning on the other hand struggle with generalizing about environment dynamics out of the training distribution. We argue that two principles can help bridge this gap: latent learning and parsimonious dynamics. Humans tend to think about environment dynamics in simple terms -- we reason about trajectories not in reference to what we expect to see along a path, but rather in an abstract latent space, containing information about the places' spatial coordinates. Moreover, we assume that moving around in novel parts of our environment works the same way as in parts we are familiar with. These two principles work together in tandem: it is in the latent space that the dynamics show parsimonious characteristics. We develop a model that learns such parsimonious dynamics. Using a variational objective, our model is trained to reconstruct experienced transitions in a latent space using locally linear transformations, while encouraged to invoke as few distinct transformations as possible. Using our framework, we demonstrate the utility of learning parsimonious latent dynamics models in a range of policy learning and planning tasks.
翻译:人类是技术熟练的航海者:我们通过新的地方进行适当的操控:当我们回到我们以前所见过的地方时,我们恰如其分地通过新的地方,意识到我们甚至可以设想通过我们从未访问过的环境的某些部分的捷径。目前基于模型的强化学习方法,另一方面,我们努力从培训分布中全面了解环境动态。我们争辩说,两个原则可以帮助弥合这一差距:潜意识学习和令人厌恶的动态。人类往往从简单的角度来思考环境动态 -- -- 我们的理由不是从我们所期望的道路上看到的轨道上,而是从抽象的潜伏空间中看到,包含关于地方空间坐标的信息。此外,我们假设在我们环境中的新颖部分的移动方法与我们熟悉的部分工作相同。这两个原则同时工作:动态在潜伏空间中显示相似的特征。我们开发了一种模型,以学习这种偏差的动态。我们用一个变异的目的,我们的模式是用本地线性转变来重建在潜伏空间中经历的转变,同时鼓励人们以少数不同的变异的变异的变动模式来进行学习。