We propose new width-based planning and learning algorithms inspired from a careful analysis of the design decisions made by previous width-based planners. The algorithms are applied over the Atari-2600 games and our best performing algorithm, Novelty guided Critical Path Learning (N-CPL), outperforms the previously introduced width-based planning and learning algorithms $\pi$-IW(1), $\pi$-IW(1)+ and $\pi$-HIW(n, 1). Furthermore, we present a taxonomy of the Atari-2600 games according to some of their defining characteristics. This analysis of the games provides further insight into the behaviour and performance of the algorithms introduced. Namely, for games with large branching factors, and games with sparse meaningful rewards, N-CPL outperforms $\pi$-IW, $\pi$-IW(1)+ and $\pi$-HIW(n, 1).
翻译:我们根据对先前的宽度规划人员所作设计决定的仔细分析,提出了新的宽度规划和学习算法,这些算法适用于Atari 2600游戏和我们最佳的演算法“新颖引导关键路径学习”(N-CPL),优于以前采用的宽度规划和学习算法$\pi$-IW(1),$\pi$-IW(1)+和$\pi$-HIW(n, 1),此外,我们根据Atari 2600游戏的某些界定特点,对Atari 2600游戏进行了分类,这种对游戏的分析进一步深入了解了所引入的算法的行为和表现。也就是说,对于具有大分支因素的游戏和报酬微薄的游戏,N-CPL优于$\pi$-IW,$\pi$-IW,$\pi$-IW(1)+和$\pi$\pi$-HIW(n, 1)和$\pi-HIW(n, 1)和$\pi-HW(n, 1)。