The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data, in combination with online fine-tuning guided by subgoals in learned lossy representation space. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. Learned from the broad data, the lossy representation emphasizes task-relevant information about states and goals while abstracting away redundant contexts that hinder generalization. It thus enables subgoal planning for unseen tasks, provides a compact input to the policy, and facilitates reward shaping during fine-tuning. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
翻译:广泛数据集的利用已证明对于广泛各领域的概括化至关重要。然而,如何有效地利用多种多任务数据进行新的下游任务,仍然是机器人的巨大挑战。为了应对这一挑战,我们引入了一个框架,通过离线强化学习广泛数据,结合在已学失传代表空间中以次级目标为指导进行在线微调,通过离线强化学习,为隐性时间超长的任务制定有目标条件的政策。在面临新颖的任务目标时,框架使用一个可负担的模型,将损失表象序列规划为分目标,将原有任务分解为较容易的问题。从广泛的数据中得来,损失表象强调与任务有关的国家和目标的信息,同时抽取阻碍概括化的冗余环境。因此,它使得能够对隐性任务进行次级目标规划,为政策提供简明投入,并在微调期间促进奖励。我们显示,我们的框架可以预先培训,从先前工作的经验中获得大规模数据集,并有效地微调新任务,完全从视觉输入,而无需任何人工奖励工程。