Learning to solve long horizon temporally extended tasks with reinforcement learning has been a challenge for several years now. We believe that it is important to leverage both the hierarchical structure of complex tasks and to use expert supervision whenever possible to solve such tasks. This work introduces an interpretable hierarchical agent framework by combining planning and semantic goal directed reinforcement learning. We assume access to certain spatial and haptic predicates and construct a simple and powerful semantic goal space. These semantic goal representations are more interpretable, making expert supervision and intervention easier. They also eliminate the need to write complex, dense reward functions thereby reducing human engineering effort. We evaluate our framework on a robotic block manipulation task and show that it performs better than other methods, including both sparse and dense reward functions. We also suggest some next steps and discuss how this framework makes interaction and collaboration with humans easier.
翻译:多年来,通过强化学习学习解决长期、长期、长期任务是一项挑战。我们认为,必须利用复杂任务的等级结构,并尽可能利用专家监督来完成这些任务。这项工作引入了可解释的等级代理框架,将规划和语义目标结合起来,引导强化学习。我们承担某些空间和偶然的前提,并构建一个简单和强大的语义目标空间。这些语义目标表述更容易解释,使专家监管和干预更加容易。它们还消除了撰写复杂、密集的奖励功能从而减少人类工程努力的必要性。我们评估了我们关于机器人整块操作任务的框架,并表明其表现优于其他方法,包括稀少和密集的奖赏功能。我们还建议了今后的一些步骤,并讨论了这一框架如何使与人类的互动和合作更加容易。