按离线强化学习环境顺序排列的基于愿景的机器人操纵任务 (Learning Vision-based Robotic Manipulation Tasks Sequentially in Offline Reinforcement Learning Settings)

With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online-RL does not suit itself readily into this paradigm due to costly and time-taking agent environment interaction. Therefore recently, many offline-RL algorithms have been proposed to learn robotic tasks. But mainly, all such methods focus on a single task or multi-task learning, which requires retraining every time we need to learn a new task. Continuously learning tasks without forgetting previous knowledge combined with the power of offline deep-RL would allow us to scale the number of tasks by keep adding them one-after-another. In this paper, we investigate the effectiveness of regularisation-based methods like synaptic intelligence for sequentially learning image-based robotic manipulation tasks in an offline-RL setup. We evaluate the performance of this combined framework against common challenges of sequential learning: catastrophic forgetting and forward knowledge transfer. We performed experiments with different task combinations to analyze the effect of task ordering. We also investigated the effect of the number of object configurations and density of robot trajectories. We found that learning tasks sequentially helps in the propagation of knowledge from previous tasks, thereby reducing the time required to learn a new task. Regularisation based approaches for continuous learning like the synaptic intelligence method although helps in mitigating catastrophic forgetting but has shown only limited transfer of knowledge from previous tasks.

翻译：随着深层强化学习方法(RL)的上升,许多复杂的机器人操作任务正在得到解决。然而,利用深层学习的全部力量需要大量的数据集。在线RL由于成本昂贵和时间跨的代理环境互动,不适应于这一范例。因此,最近,许多离线-RL算法被提议学习机器人任务。但最重要的是,所有这类方法都侧重于单项任务或多任务学习,这需要每次我们学习新任务时再培训。在不忘记先前知识的同时利用离线深层学习的力量的情况下,不断学习任务的任务将使我们能够扩大任务的数量,不断增加一个接一个。在本文中,我们调查基于常规化方法的方法的有效性,例如用于在离线的设置中按顺序学习基于图像的机器人操纵任务。我们评估这一组合框架的绩效,要应对连续学习的共同挑战:灾难性的遗忘和前期知识转移。我们用不同的任务组合来分析任务排序的效果。我们还调查了在以往的离线性配置和不断升级的序列学习任务中显示的序列学习方法。我们只是从以往的顺序学习方法学习常规任务,而有助于不断学习新任务。