Reinforcement learning requires interaction with an environment, which is expensive for robots. This constraint necessitates approaches that work with limited environmental interaction by maximizing the reuse of previous experiences. We propose an approach that maximizes experience reuse while learning to solve a given task by generating and simultaneously learning useful auxiliary tasks. To generate these tasks, we construct an abstract temporal logic representation of the given task and leverage large language models to generate context-aware object embeddings that facilitate object replacements. Counterfactual reasoning and off-policy methods allow us to simultaneously learn these auxiliary tasks while solving the given target task. We combine these insights into a novel framework for multitask reinforcement learning and experimentally show that our generated auxiliary tasks share similar underlying exploration requirements as the given task, thereby maximizing the utility of directed exploration. Our approach allows agents to automatically learn additional useful policies without extra environment interaction.
翻译:强化学习需要与一个对机器人来说费用昂贵的环境进行互动。 这一制约要求采取一些方法,通过尽量重复利用以往的经验,在有限的环境互动中发挥作用。 我们提出一种方法,在通过生成和同时学习有用的辅助任务解决特定任务的同时,最大限度地利用经验再利用,同时学习有用的辅助任务。 为了完成这些任务,我们构建了一个抽象的时间逻辑代表,并利用大型语言模型生成符合背景的物体嵌入,从而便利对象替换。 反事实推理和离政策方法使我们能够在解决既定目标任务时同时学习这些辅助任务。 我们将这些洞察力纳入多任务强化学习和实验性实验性地显示我们产生的辅助任务有着与既定任务相似的基本勘探要求,从而最大限度地发挥定向探索的效用。 我们的方法允许代理人在不增加环境互动的情况下自动学习更多有用的政策。</s>