Width-based planning has shown promising results on Atari 2600 games using pixel input, while using substantially fewer environment interactions than reinforcement learning. Recent width-based approaches have computed feature vectors for each screen using a hand designed feature set or a variational autoencoder (VAE) trained on game screens, and prune screens that do not have novel features during the search. In this paper, we explore consideration of uncertainty in features generated by a VAE during width-based planning. Our primary contribution is the introduction of active learning to maximize the utility of screens observed during planning. Experimental results demonstrate that use of active learning strategies increases gameplay scores compared to alternative width-based approaches with equal numbers of environment interactions.
翻译:Atari 2600 游戏的 Width 规划显示Atari 2600 游戏使用像素输入,但使用的环境互动比强化学习少得多。最近采用宽度方法计算了每个屏幕的特性矢量,使用的是手设计功能集或变式自动编码器(VAE),在游戏屏幕方面受过培训,而光谱屏幕在搜索过程中没有新特点。在本文中,我们探讨了在宽度规划中VAE产生的特性的不确定性。我们的主要贡献是引入积极学习,以最大限度地发挥规划期间所观测到的屏幕的效用。实验结果显示,使用积极学习战略可以提高游戏计分,而采用环境互动数量相等的其他宽度方法则可以提高游戏计分数。