Learning composable policies for environments with complex rules and tasks is a challenging problem. We introduce a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable. LOF efficiently learns policies that satisfy tasks by representing the task as an automaton and integrating it into learning and planning. We provide and prove conditions under which LOF will learn satisfying, optimal policies. And lastly, we show how LOF's learned policies can be composed to satisfy unseen tasks with only 10-50 retraining steps. We evaluate LOF on four tasks in discrete and continuous domains, including a 3D pick-and-place environment.
翻译:对于具有复杂规则和任务的环境,我们引入了一个名为“逻辑选择框架”的等级强化学习框架(LOF),以学习满足、最佳和可成型的政策;LOF有效地学习了能够满足任务的政策,将任务作为自动图进行,并将其纳入学习和规划;我们提供并证明LOF能够学习满意、最佳政策的条件;最后,我们展示了LOF的学习政策如何组成,以仅仅以10-50的再培训步骤来完成不可见的任务。我们评估LOF在独立和连续领域(包括3D选址环境)的四项任务。