Atari-2600基准线上具有学习基础政策和理论理论的宽度透视器 (Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark)

We propose new width-based planning and learning algorithms applied over the Atari-2600 benchmark. The algorithms presented are inspired from a careful analysis of the design decisions made by previous width-based planners. We benchmark our new algorithms over the Atari-2600 games and show that our best performing algorithm, RIW$_C$+CPV, outperforms previously introduced width-based planning and learning algorithms $\pi$-IW(1), $\pi$-IW(1)+ and $\pi$-HIW(n, 1). Furthermore, we present a taxonomy of the set of Atari-2600 games according to some of their defining characteristics. This analysis of the games provides further insight into the behaviour and performance of the width-based algorithms introduced. Namely, for games with large branching factors, and games with sparse meaningful rewards, RIW$_C$+CPV outperforms $\pi$-IW, $\pi$-IW(1)+ and $\pi$-HIW(n, 1).

翻译：我们提出了适用于Atari-2600基准的新的宽度规划和学习算法,介绍的算法是根据对先前的宽度规划者所作设计决定的仔细分析得出的。我们用Atari-2600游戏来衡量我们的新算法,并表明我们的最佳算法,即RIW$_C$+CPV,优于以前引入的宽度规划和学习算法$\pi$-IW(1),$\pi$-IW(1)+和$\pi$-HIW(n, 1)。此外,我们还根据Atari-2600游戏的一些定义特点,对这套游戏进行了分类。对游戏的分析进一步揭示了宽度算法的行为和表现。也就是说,对于具有大型分支因素的游戏和具有微薄的有益报酬的游戏,RIW$_C$+CPV以美元-IW,$\pi$-IW(1)+和$\pi$\piW(n, 1)和$\piW(n, 1)-HIW(n, 1)。