Creating reinforcement learning (RL) agents that are capable of accepting and leveraging task-specific knowledge from humans has been long identified as a possible strategy for developing scalable approaches for solving long-horizon problems. While previous works have looked at the possibility of using symbolic models along with RL approaches, they tend to assume that the high-level action models are executable at low level and the fluents can exclusively characterize all desirable MDP states. This need not be true and this assumption overlooks one of the central technical challenges of incorporating symbolic task knowledge, namely, that these symbolic models are going to be an incomplete representation of the underlying task. To this end, we introduce Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to capture the incompleteness of the symbolic model. We will use these models to extract high-level landmarks that will be used to decompose the task, and at the low level, we learn a set of diverse policies for each possible task sub-goal identified by the landmark. We evaluate our system by testing on three different benchmark domains and we show how even with incomplete symbolic model information, our approach is able to discover the task structure and efficiently guide the RL agent towards the goal.
翻译:建立能够接受和利用人类特定任务知识的强化学习(RL)代理机构,这些代理机构能够接受和利用人类特定任务知识,长期以来被确定为一种可能的战略,以制定可扩展的方法解决长期横向问题。虽然以前的工作研究过使用象征性模型和RL方法的可能性,但它们往往认为,高级别行动模型可以在低层次上实施,流利者可以专门描述所有理想的MDP国家。这不一定是真实的,这一假设忽略了纳入象征性任务知识的一个核心技术挑战,即这些象征性模型将不完全代表基本任务。为此,我们引入了“符号-模式引导强化学习”系统,我们将在其中正式确定象征性模型和基本MDP之间的关系,使我们能够捕捉象征性模型的不完善性。我们将利用这些模型来提取高层次的里程碑,用来消除任务,在低层次上,我们为里程碑确定的每一项可能的任务次级目标学习了一套不同的政策。我们通过测试三个不同的基准领域来评估我们的系统,我们用“符号-模块-引导强化学习,我们用“符号-引导”模型来展示能够实现的“指标性任务”模型,我们甚至用不完全的“指标性模型来发现“目标”。