AlphaZero and its extension MuZero are computer programs that use machine-learning techniques to play at a superhuman level in chess, go, and a few other games. They achieved this level of play solely with reinforcement learning from self-play, without any domain knowledge except the game rules. It is a natural idea to adapt the methods and techniques used in AlphaZero for solving search problems such as the Boolean satisfiability problem (in its search version). Given a search problem, how to represent it for an AlphaZero-inspired solver? What are the "rules of solving" for this search problem? We describe possible representations in terms of easy-instance solvers and self-reductions, and we give examples of such representations for the satisfiability problem. We also describe a version of Monte Carlo tree search adapted for search problems.
翻译:AlphaZero及其扩展的MuZero是使用机器学习技术在象棋中以超人水平玩耍的计算机程序,可以去玩其他几场游戏。它们之所以能够达到这一水平,完全靠的是自我游戏的强化学习,除了游戏规则之外,没有任何领域知识。调整AlphaZero所使用的方法和技术以解决诸如布尔兰卫星可诉性问题(搜索版)等搜索问题,这是一个自然的想法。鉴于一个搜索问题,如何将其代表为阿尔法泽罗激发的解答器?这个搜索问题的“解决规则”是什么?我们用容易进入的解答器和自我缩减来描述可能的表达方式,我们用这种表达方式来说明可诉性问题的例子。我们还描述了适应搜索问题的蒙特卡洛树搜索版本。