Improving sample efficiency is a key challenge in reinforcement learning, especially in environments with large state spaces and sparse rewards. In literature, this is resolved either through the use of auxiliary tasks (subgoals) or through clever exploration strategies. Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorporated where the reward is sparse. However, few studies have attempted to tackle both large scale and reward sparsity at the same time. This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy. We present a way to learn value functions which can be used to sample actions and provide directed exploration. Experiments on navigation tasks with varying grid sizes demonstrate the performance advantages over several competitive baselines.
翻译:提高采样效率是强化学习的关键挑战,特别是在国家空间大、回报少的环境中,在文献中,这要么通过使用辅助任务(次级目标),要么通过巧妙的勘探战略加以解决。探索方法被用来在大环境中对轨迹进行更好的取样,而在奖励少的地方则纳入辅助任务。然而,很少有研究试图同时处理大规模和奖励过度的问题。本文件探讨了利用一般价值函数和定向勘探战略将勘探与辅助任务学习结合起来的想法。我们介绍了一种学习价值功能的方法,可用来对行动进行取样并提供定向勘探。对不同网格规模的导航任务进行的实验显示了在若干竞争性基线上的业绩优势。</s>