与未知原因依赖性相关的实时逻辑指令框架 (A Framework for Following Temporal Logic Instructions with Unknown Causal Dependencies)

Teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments is a challenging problem. We consider that user defines every task by a linear temporal logic (LTL) formula. However, some causal dependencies in complex environments may be unknown to the user in advance. Hence, when human user is specifying instructions, the robot cannot solve the tasks by simply following the given instructions. In this work, we propose a hierarchical reinforcement learning (HRL) framework in which a symbolic transition model is learned to efficiently produce high-level plans that can guide the agent efficiently solve different tasks. Specifically, the symbolic transition model is learned by inductive logic programming (ILP) to capture logic rules of state transitions. By planning over the product of the symbolic transition model and the automaton derived from the LTL formula, the agent can resolve causal dependencies and break a causally complex problem down into a sequence of simpler low-level sub-tasks. We evaluate the proposed framework on three environments in both discrete and continuous domains, showing advantages over previous representative methods.

翻译：在多任务环境中教授深强化学习(RL)代理器以遵循多任务环境中的指示是一个具有挑战性的问题。我们认为,用户用线性时间逻辑(LTL)公式来界定每一项任务。然而,在复杂的环境中,用户可能事先不了解某些因果关系。因此,当人类用户在指定指示时,机器人无法简单地按照给定指示来完成任务。在这项工作中,我们提议了一个等级强化学习框架,在其中学习一个象征性的过渡模式,以便有效地制定高层次计划,从而指导代理商高效率地解决不同的任务。具体地说,象征性过渡模式是通过感知逻辑编程(ILP)来学习的,以掌握国家过渡的逻辑规则。通过规划象征性过渡模型的产物和从LTL公式产生的自动图,该代理商能够解决因果关系,并将因果复杂的问题分成一个更简单的低层次子任务序列。我们评估了在离散和连续领域的三个环境的拟议框架,显示了比以往具有代表性的方法更有利的条件。