使用深强化学习来提取和放置子任务,实现等级任务分解 (Towards Hierarchical Task Decomposition using Deep Reinforcement Learning for Pick and Place Subtasks)

Deep Reinforcement Learning (DRL) is emerging as a promising approach to generate adaptive behaviors for robotic platforms. However, a major drawback of using DRL is the data-hungry training regime that requires millions of trial and error attempts, which is impractical when running experiments on robotic systems. Learning from Demonstrations (LfD) has been introduced to solve this issue by cloning the behavior of expert demonstrations. However, LfD requires a large number of demonstrations that are difficult to be acquired since dedicated complex setups are required. To overcome these limitations, we propose a multi-subtask reinforcement learning methodology where complex pick and place tasks can be decomposed into low-level subtasks. These subtasks are parametrized as expert networks and learned via DRL methods. Trained subtasks are then combined by a high-level choreographer to accomplish the intended pick and place task considering different initial configurations. As a testbed, we use a pick and place robotic simulator to demonstrate our methodology and show that our method outperforms a benchmark methodology based on LfD in terms of sample-efficiency. We transfer the learned policy to the real robotic system and demonstrate robust grasping using various geometric-shaped objects.

翻译：深度强化学习(DRL)正在成为为机器人平台产生适应性行为的一个很有希望的方法。然而,使用DRL的一个主要缺点是数据饥饿培训制度,这种制度需要数百万次试验和错误尝试,在对机器人系统进行实验时,这是不切实际的。从演示(LfD)中学习,通过专家演示行为克隆来解决这一问题。然而,LfD需要大量难以获得的演示,因为需要专门的复杂设置。为了克服这些限制,我们建议一种多子任务强化学习方法,使复杂的选取和定位任务能够分解成低级子任务。这些子任务作为专家网络被匹配,并通过DRL方法学习。然后,通过高级编舞者将培训的子任务结合起来,完成预定的选取和安排任务,同时考虑到不同的初始配置。作为测试台,我们使用一个选取和安装机器人模拟器来演示我们的方法,并表明我们的方法在样品效率方面比LfD高的基基准方法要差。我们把所学到的政策转换成一个真实的系统。我们把所学到的精准的系统转移到了精确的系统。