Reinforcement Learning (RL) and Imitation Learning (IL) have made great progress in robotic control in recent years. However, these methods show obvious deterioration for new tasks that need to be completed through new combinations of actions. RL methods heavily rely on reward functions that cannot generalize well for new tasks, while IL methods are limited by expert demonstrations which do not cover new tasks. In contrast, humans can easily complete these tasks with the fragmented knowledge learned from task-agnostic experience. Inspired by this observation, this paper proposes a task-agnostic learning method (TAL for short) that can learn fragmented knowledge from task-agnostic data to accomplish new tasks. TAL consists of four stages. First, the task-agnostic exploration is performed to collect data from interactions with the environment. The collected data is organized via a knowledge graph. Compared with the previous sequential structure, the knowledge graph representation is more compact and fits better for environment exploration. Second, an action feature extractor is proposed and trained using the collected knowledge graph data for task-agnostic fragmented knowledge learning. Third, a candidate action generator is designed, which applies the action feature extractor on a new task to generate multiple candidate action sets. Finally, an action proposal is designed to produce the probabilities for actions in a new task according to the environmental information. The probabilities are then used to select actions to be executed from multiple candidate action sets to form the plan. Experiments on a virtual indoor scene show that the proposed method outperforms the state-of-the-art offline RL method: CQL by 35.28% and the IL method: BC by 22.22%.
翻译:强化学习(RL)和模拟学习(IL)近年来在机器人控制方面取得了巨大的进步。然而,这些方法显示,需要通过新的行动组合完成的新任务明显恶化。这些方法严重依赖奖励功能,这些功能无法对新任务进行概括化,而 IL 方法则受到专家演示的限制,这些演示并不包含新的任务。相比之下,人类能够轻松完成这些任务,因为从任务不可知性经验中学到了零散的知识。受此观察的启发,本文件建议了一种任务不可知性学习方法(TAL 简称为短期),该方法可以从任务不可知性数据中学习零散的知识。TAL 由四个阶段组成。首先,任务不可知性探索是为了收集与环境互动的数据。所收集的数据通过知识图表组织起来,与先前的顺序结构相比,知识图的显示更加紧凑紧凑,更适合环境探索。第二,利用收集到的知识图表提取器来进行操作。第三,设计了一个从任务不可分解的数据操作器来完成新任务- Q- 。一个候选人动作生成器,在选择的动作中将动作方法中显示一个新的动作动作,在选择动作组中,在选择动作中,在选择动作中,在任务中,在任务中,在任务选择动作中,在任务中,将使用新的动作序列中将一个动作动作动作动作中,在任务选择一个动作动作中,在选择一个动作中,在任务选择一个动作序列中,在任务选择一个动作中将一个动作序列中将一个动作中将一个动作动作动作中,在新的动作中,以生成一个新的动作序列中,在新的动作序列中将一个动作中生成一个动作显示一个新的动作序列中,在新的动作中,在新的动作组中,在任务选择一个动作中显示一个动作序列中,在任务选择一个动作。