Deep reinforcement learning has shown its effectiveness in various applications and provides a promising direction for solving tasks with high complexity. In most reinforcement learning algorithms, however, two major issues need to be dealt with - the sample inefficiency and the interpretability of a policy. The former happens when the environment is sparsely rewarded and/or has a long-term credit assignment problem, while the latter becomes a problem when the learned policies are deployed at the customer side product. In this paper, we propose a novel hierarchical reinforcement learning algorithm that mitigates the aforementioned issues by decomposing the original task in a hierarchy and by compounding pretrained primitives with intents. We show how the proposed scheme can be employed in practice by solving a pick and place task with a 6 DoF manipulator.
翻译:深入强化学习在各种应用中显示出其有效性,并为解决复杂程度高的任务提供了有希望的方向。然而,在大多数强化学习算法中,有两个主要问题需要处理,一个是政策效率低下的样本,另一个是政策的可解释性。前者是当环境受到微薄的奖励和/或长期的信用分配问题时发生的,而后者则是当所学政策在客户的副产品中部署时产生的问题。在本文件中,我们提出一种新的等级强化学习算法,通过将最初的任务分解成等级制和将经过预先训练的原始人与意图相结合来缓解上述问题。我们通过解决6 DoF操纵器的挑选和安排任务,说明如何在实践中采用拟议的计划。