We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. First, DiRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines.
翻译:我们研究的是逻辑规格规定的复杂任务的学习控制政策问题。 最近的方法自动从特定规格中产生奖励功能,并使用适当的强化学习算法来学习尽可能增加预期报酬的政策。 但是,这些方法规模不高,不足以完成需要高层规划的复杂任务。 在这项工作中,我们开发了一种组成学习方法,称为DirL, 它将高级规划和强化学习与高级规划和强化学习相隔。 首先, DirL 将规格编码为抽象图; 直观、 垂直和图形边缘分别对应于州空间的区域和较简单的子任务。 我们的方法随后纳入强化学习学习, 学习用于Dijstra型规划算法中每个边缘(子任务)的神经网络政策, 以在图表中计算一个高级计划。 对一套具有挑战性的、 持续状态和行动空间的控制基准的拟议方法的评估表明,它超越了最先进的基线。