The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data, in combination with online fine-tuning guided by subgoals in learned lossy representation space. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. Learned from the broad data, the lossy representation emphasizes task-relevant information about states and goals while abstracting away redundant contexts that hinder generalization. It thus enables subgoal planning for unseen tasks, provides a compact input to the policy, and facilitates reward shaping during fine-tuning. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
翻译:利用广泛数据集对于在各个领域实现泛化至关重要。然而,如何有效地利用不同的多任务数据来学习新的下游任务仍然是机器人领域面临的重要挑战。为了解决这个问题,我们引入了一个框架,在广泛的数据上进行脱机强化学习,结合基于学习的丢失感知表示空间中子目标引导的在线微调,从而获得针对未见过的时间延长任务的目标条件策略。当面临新的任务目标时,该框架使用一个所谓的感知模型来规划代表序列,将原始任务分解为更容易解决的问题。利用广泛数据学习到的丢失感知表示突出了与状态和目标相关的信息,同时抽象掉妨碍泛化的多余上下文。它能够实现针对未见过任务的子目标规划,提供压缩的策略输入,并在微调过程中促进奖励形状。我们证明,我们的框架可以在视觉输入完全没有手动奖励工程的情况下,使用之前的机器人经验的大规模数据集进行预训练,并且可以有效地微调用于新任务。