In this paper, we extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information, to the framework of stochastic games. We propose two ways of doing this, the first way generalizes the search algorithm, i.e. Descent, to stochastic games and the second way approximates stochastic games by deterministic games. We then evaluate them on the game EinStein wurfelt nicht! against state-of-the-art algorithms: Expectiminimax and Polygames (i.e. the Alpha Zero algorithm). It is our generalization of Descent which obtains the best results. The approximation by deterministic games nevertheless obtains good results, presaging that it could give better results in particular contexts.
翻译:在本文中,我们扩展了 " 血统框架 ",使在信息齐全的双玩游戏中进行学习和规划,将其扩展到了 " 随机游戏 " 的框架。我们提出了两种方法来做到这一点,第一种方法是将搜索算法(即 " 出身 " )概括为 " 随机游戏 ",第二种方式则通过 " 确定性 " 游戏来将 " 随机游戏相近 " 。然后我们用 " EinStein wurfelt nicht! " 与 " 最先进的算法:预期的Minimax " 和 " Polygames " (即 " Alpha Zero算法 " )对 " 进行了评估。这是我们 " 血统 " 普遍化 " 获得最佳结果 " 的方法。不过, " 确定性游戏 " 的近似 " 却取得了良好的效果,预示着它能在特定情况下产生更好的效果。