In this paper, we explore an approach to auxiliary task discovery in reinforcement learning based on ideas from representation learning. Auxiliary tasks tend to improve data efficiency by forcing the agent to learn auxiliary prediction and control objectives in addition to the main task of maximizing reward, and thus producing better representations. Typically these tasks are designed by people. Meta-learning offers a promising avenue for automatic task discovery; however, these methods are computationally expensive and challenging to tune in practice. In this paper, we explore a complementary approach to the auxiliary task discovery: continually generating new auxiliary tasks and preserving only those with high utility. We also introduce a new measure of auxiliary tasks usefulness based on how useful the features induced by them are for the main task. Our discovery algorithm significantly outperforms random tasks, hand-designed tasks, and learning without auxiliary tasks across a suite of environments.
翻译:在本文中,我们根据代表性学习的想法,探索在强化学习中发现辅助任务的方法;辅助任务往往提高数据效率,迫使代理商学习辅助预测和控制目标,除了最大限度地获得奖励的主要任务外,还学习辅助预测和控制目标,从而产生更好的表现;这些任务通常是由人设计。元学习为自动发现任务提供了一个充满希望的渠道;然而,这些方法在计算上成本高昂,在实践中难以调和。在本文件中,我们探索辅助任务发现的补充方法:不断产生新的辅助任务,并只保存高效用的任务。我们还根据这些辅助任务的特点对主要任务有多大的用处,引入了辅助任务的新措施。我们的发现算法大大超越了随机任务、手工设计的任务和在一系列环境中没有辅助任务的学习。