Deep reinforcement learning has shown its effectiveness in various applications, providing a promising direction for solving tasks with high complexity. However, naively applying classical RL for learning a complex long-horizon task with a single control policy is inefficient. Thus, policy modularization tackles this problem by learning a set of modules that are mapped to primitives and properly orchestrating them. In this study, we further expand the discussion by incorporating simultaneous activation of the skills and structuring them into multiple hierarchies in a recursive fashion. Moreover, we sought to devise an algorithm that can properly orchestrate the skills with different action spaces via multiplicative Gaussian distributions, which highly increases the reusability. By exploiting the modularity, interpretability can also be achieved by observing the modules that are used in the new task if each of the skills is known. We demonstrate how the proposed scheme can be employed in practice by solving a pick and place task with a 6 DoF manipulator, and examine the effects of each property from ablation studies.
翻译:深入强化学习在各种应用中显示了其有效性,为解决复杂任务提供了有希望的方向。然而,天真地应用经典RL来学习复杂的长视线任务,并采用单一的控制政策,效率低下。因此,政策模块化通过学习一套原始人所绘制的模块来解决这个问题,并适当地安排这些模块。在这项研究中,我们进一步扩大讨论,同时将技能的激活和结构纳入一个循环式的多重等级结构。此外,我们试图设计一种算法,通过多复制性高地分布,以不同的行动空间适当协调技能,这大大提高了可重复性。通过利用模块性,也可以通过观察每个技能都为原始人所了解的新任务中使用的模块来实现可解释性。我们展示了如何通过与6 DoF 操纵器解决选取和定位任务,并研究通融化研究的每一种属性的效果,从而在实践中运用拟议方案。