For service robots to become general-purpose in everyday household environments, they need not only a large library of primitive skills, but also the ability to quickly learn novel tasks specified by users. Fine-tuning neural networks on a variety of downstream tasks has been successful in many vision and language domains, but research is still limited on transfer learning between diverse long-horizon tasks. We propose that, compared to reinforcement learning for a new household activity from scratch, home robots can benefit from transferring the value and policy networks trained for similar tasks. We evaluate this idea in the BEHAVIOR simulation benchmark which includes a large number of household activities and a set of action primitives. For easy mapping between state spaces of different tasks, we provide a text-based representation and leverage language models to produce a common embedding space. The results show that the selection of similar source activities can be informed by the semantic similarity of state and goal descriptions with the target task. We further analyze the results and discuss ways to overcome the problem of catastrophic forgetting.
翻译:要使服务机器人在日常家庭环境中成为一般用途,他们不仅需要庞大的原始技能图书馆,还需要迅速学习用户指定的新任务的能力。关于各种下游任务的神经网络的微调在许多视觉和语言领域都取得了成功,但对于不同长视线任务之间的转移学习的研究仍然有限。我们建议,与从零开始加强新家庭活动的学习相比,家庭机器人可以受益于为类似任务培训的价值和政策网络的转移。我们在BEHAVIOR模拟基准中评估这一想法,其中包括大量家庭活动和一套行动原始。为了便于在不同任务国家空间之间绘制地图,我们提供了一种基于文本的表述和利用语言模型,以产生共同的嵌入空间。结果显示,类似的源活动的选择可以参考与目标任务相似的状态和目标描述的语义性。我们进一步分析结果并讨论克服灾难性遗忘问题的方法。