Deep Reinforcement Learning has been very successful recently with various works on complex domains. Most works are concerned with learning a single policy that solves the target task, but is fixed in the sense that if the environment changes the agent is unable to adapt to it. Successor Features (SFs) proposes a mechanism that allows learning policies that are not tied to any particular reward function. In this work we investigate how SFs may be pre-trained without observing any reward in a custom environment that features resource collection, traps and crafting. After pre-training we expose the SF agents to various target tasks and see how well they can transfer to new tasks. Transferring is done without any further training on the SF agents, instead just by providing a task vector. For training the SFs we propose a task relabelling method which greatly improves the agent's performance.
翻译:深强化学习最近非常成功,在复杂的领域开展了各种工作。 大部分工作都涉及学习一项解决目标任务的单一政策, 但被固定在这样的意义上:如果环境发生变化, 代理人无法适应它。 继承特征( SF) 提议了一个机制, 允许学习政策与任何特定的奖励功能没有联系。 在这项工作中, 我们调查如何在不观察任何奖赏的情况下, 在具有资源收集、 陷阱和编织特点的定制环境中对SF进行预培训。 在培训前, 我们让SF代理人了解各种目标任务, 并且看看他们能够向新的任务转移多少。 转让是在没有再培训SF代理人的情况下进行的, 而不是仅仅通过提供任务矢量来进行。 为了培训SF, 我们建议了一个任务重新标签方法, 大大改进代理人的绩效 。