Recent breakthroughs in Artificial Intelligence have shown that the combination of tree-based planning with deep learning can lead to superior performance. We present Adaptive Entropy Tree Search (ANTS) - a novel algorithm combining planning and learning in the maximum entropy paradigm. Through a comprehensive suite of experiments on the Atari benchmark we show that ANTS significantly outperforms PUCT, the planning component of the state-of-the-art AlphaZero system. ANTS builds upon recent work on maximum entropy planning methods - which however, as we show, fail in combination with learning. ANTS resolves this issue to reach state-of-the-art performance. We further find that ANTS exhibits superior robustness to different hyperparameter choices, compared to the previous algorithms. We believe that the high performance and robustness of ANTS can bring tree search planning one step closer to wide practical adoption.
翻译:人工智能的近期突破表明,以树为基础的规划与深层学习相结合,可以带来优异的绩效。我们展示了适应性植树搜索(ANTS) — — 将规划和学习结合到最大灵率范式中的一种新奇算法。通过对阿塔里基准的一整套综合实验,我们发现ANTS大大优于最先进的阿尔法泽罗系统规划部分PUCT。ANTS以最近关于最大灵敏规划方法的工作为基础 — — 然而,正如我们所显示的那样,这与学习相结合是失败的。ANTS解决了这一问题,以达到最先进的性能。我们进一步发现,与以往的算法相比,ANTS显示,不同的超光量度选择具有超强的强性能。我们认为,ANTS的高性能和稳健性能可以使树搜索规划更接近于广泛的实际采用。</s>