Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI. This paper describes progresses on this challenge in the context of man-made environments, which are visually diverse but contain intrinsic semantic regularities. We propose a hybrid model-based and model-free approach, LEArning and Planning with Semantics (LEAPS), consisting of a multi-target sub-policy that acts on visual inputs, and a Bayesian model over semantic structures. When placed in an unseen environment, the agent plans with the semantic model to make high-level decisions, proposes the next sub-target for the sub-policy to execute, and updates the semantic model based on new observations. We perform experiments in visual navigation tasks using House3D, a 3D environment that contains diverse human-designed indoor scenes with real-world objects. LEAPS outperforms strong baselines that do not explicitly plan using the semantic content.
翻译:对大赦国际来说,建设能够概括和适应看不见环境的深层强化学习剂仍然是一项根本性挑战。本文件描述了在人造环境中在这一挑战上取得的进展,这种环境具有视觉多样性,但含有内在的语义规律。我们建议采用基于模型和不使用模型的混合方法,即用语义(LEAPS)进行语言和规划(LEAPS),由多目标次级政策组成,该次级政策针对视觉投入采取行动,并针对语义结构建立一种巴耶斯式模式。在被置于一个看不见环境中时,该代理计划与语义模型一起作出高级别决定,提出下一个次级政策执行次级目标,并更新基于新观测的语义模型。我们用Hous3D进行视觉导航任务实验,Hous3D是一个包含各种人类设计的室内环境,带有现实世界物体。LEAPS超越了没有明确计划使用语义内容的强大基线。