Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker. However, the search methods used in these games, and in many other settings, are tabular. Tabular search methods do not scale well with the size of the search space, and this problem is exacerbated by stochasticity and partial observability. In this work we replace tabular search with online model-based fine-tuning of a policy neural network via reinforcement learning, and show that this approach outperforms state-of-the-art search algorithms in benchmark settings. In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi, and show the generality of our algorithm by also showing that it outperforms tabular search in the Atari game Ms. Pacman.
翻译:Lookahead搜索是最近AI成功的关键组成部分,例如在象棋、走和扑克游戏中。 然而,这些游戏和许多其他设置中使用的搜索方法都是表格式的。 列表搜索方法与搜索空间的大小不相称,而这一问题又因随机性和部分可观察性而加剧。 在这项工作中,我们用基于在线模型的搜索取代了表格式搜索,通过强化学习对政策神经网络进行在线模型的微调,并表明这一方法在基准设置中优于最先进的搜索算法。 特别是,我们利用我们的搜索算法实现一种新的最先进的自我游戏Hanabi结果,并展示了我们算法的通用性,同时展示了它优于Atari游戏中的表式搜索。 Pacman女士。