Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards. To this end, we created SkillHack, a benchmark of tasks and associated skills based on the game of NetHack. We evaluate a number of baselines on this benchmark, as well as our own novel skill-based method Hierarchical Kickstarting (HKS), which is shown to outperform all other evaluated methods. Our experiments show that learning with a prior knowledge of useful skills can significantly improve the performance of agents on complex problems. We ultimately argue that utilising predefined skills provides a useful inductive bias for RL problems, especially those with large state-action spaces and sparse rewards.
翻译:操作和磨练技能是人类如何学习的基本组成部分,但人工剂却很少经过专门训练来完成这些技能。相反,他们通常是经过训练的端对端,希望他们能隐含地学到有用的技能,以便最大限度地降低某些外部奖励功能的回报折扣。在本文中,我们调查如何将技能纳入复杂环境中的强化学习(RL)剂培训,这种培训具有较大的国家行动空间和微薄的奖励。为此,我们创建了SkillHack,这是基于NetHack游戏的任务和相关技能的基准。我们评估了这个基准的一些基准以及我们自己的基于技能的新型高端启动方法(HKS),这表明它超越了所有其他经过评估的方法。我们的实验表明,以先前掌握的有用技能知识学习可以大大改善代理人在复杂问题上的绩效。我们最终认为,使用预先界定的技能,为劳动力问题,特别是具有大型国家行动空间和微弱奖励的问题提供了有用的导导偏偏。