Effective exploration is a challenge in reinforcement learning (RL). Novelty-based exploration methods can suffer in high-dimensional state spaces, such as continuous partially-observable 3D environments. We address this challenge by defining novelty using semantically meaningful state abstractions, which can be found in learned representations shaped by natural language. In particular, we evaluate vision-language representations, pretrained on natural image captioning datasets. We show that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments. We also characterize why and how language provides useful abstractions for exploration by considering the impacts of using representations from a pretrained model, a language oracle, and several ablations. We demonstrate the benefits of our approach in two very different task domains -- one that stresses the identification and manipulation of everyday objects, and one that requires navigational exploration in an expansive world -- as well as two popular deep RL algorithms: Impala and R2D2. Our results suggest that using language-shaped representations could improve exploration for various algorithms and agents in challenging environments.
翻译:有效的探索是强化学习(RL)中的一项挑战。基于新颖的探索方法在高维状态空间,例如连续部分可观测的3D环境,可能会受到影响。我们通过使用由自然语言塑造的有知识的表达形式来界定具有意义的抽象状态,从而界定新颖的、具有生命意义的状态抽象,从而应对这一挑战。我们特别评价视觉语言表达方式,对自然图像字幕数据集进行预先培训。我们显示,这些预先培训的表述方式推动有意义的、与任务相关的探索,并改进3D模拟环境的性能。我们还通过考虑使用预先培训的模式、语言标志和若干节律的表述方式的影响来说明语言为何和如何为探索提供有用的抽象。我们在两个非常不同的任务领域展示了我们的方法的好处,一个领域强调对日常物体的识别和操纵,另一个领域需要在广阔的世界上进行导航探索,以及两个受欢迎的深RL算法:Impala和R2D2。我们的结果表明,使用语言表达方式可以改进对挑战环境中的各种算法和代理人的探索。