Robotic automation for pick and place task has vast applications. Deep Reinforcement Learning (DRL) is one of the leading robotic automation technique that has been able to achieve dexterous manipulation and locomotion robotics skills. However, a major drawback of using DRL is the Data hungry training regime of DRL that requires millions of trial and error attempts, impractical in real robotic hardware. We propose a multi-subtask reinforcement learning method where complex tasks can be decomposed into low-level subtasks. These subtasks can be parametrised as expert networks and learnt via existing DRL methods. The trained subtasks can be choreographed by a high-level synthesizer. As a test bed, we use a pick and place robotic simulator, and transfer the learnt behaviour in a real robotic system. We show that our method outperforms imitation learning based method and reaches high success rate compared to an end-to-end learning approach. Furthermore, we investigate the trained subtasks to demonstrate a adaptive behaviour by fine-tuning a subset of subtasks on a different task. Our approach deviates from the end-to-end learning strategy and provide an initial direction towards learning modular task representations that can generate robust behaviours.
翻译:用于选取和位置任务的机器人自动化应用范围很广。深强化学习(DRL)是领先的机器人自动化技术之一,能够实现极速操纵和移动机器人技能。然而,使用DRL的一个主要缺点是DRL的数据饥饿培训制度,DRL需要数以百万计的尝试和错误尝试,在真正的机器人硬件中是不切实际的。我们提出了一个多子任务强化学习方法,其中复杂的任务可以分解成低层次的子任务。这些子任务可以作为专家网络进行假相,并通过现有的DRL方法学习。受过训练的子任务可由高级合成人进行编织。作为测试床,我们使用一个选取和安装机器人模拟器,并将学到的行为转移到真正的机器人系统。我们表明,我们的方法超越了模拟学习方法,并且达到与端到端学习方法相比的高成功率。此外,我们调查经过训练的子任务,通过对不同任务上的一个子任务分组进行微调来显示适应行为。我们的方法可以向不同的初始学习方向偏离学习模式。我们的方法可以提供强有力的学习方向。