Recent achievements from AlphaZero using self-play has shown remarkable performance on several board games. It is plausible to think that self-play, starting from zero knowledge, can gradually approximate a winning strategy for certain two-player games after an amount of training. In this paper, we try to leverage the computational power of neural Monte Carlo Tree Search (neural MCTS), the core algorithm from AlphaZero, to solve Quantified Boolean Formula Satisfaction (QSAT) problems, which are PSPACE complete. Knowing that every QSAT problem is equivalent to a QSAT game, the game outcome can be used to derive the solutions of the original QSAT problems. We propose a way to encode Quantified Boolean Formulas (QBFs) as graphs and apply a graph neural network (GNN) to embed the QBFs into the neural MCTS. After training, an off-the-shelf QSAT solver is used to evaluate the performance of the algorithm. Our result shows that, for problems within a limited size, the algorithm learns to solve the problem correctly merely from self-play.
翻译:阿尔法泽罗(AlphaZero)最近使用自我游戏的成绩在几个棋盘游戏中表现出了惊人的成绩。 认为自我游戏从零知识开始,在经过大量培训后可以逐渐接近某些双玩游戏的获胜策略。 在本文中,我们试图利用阿尔法泽罗的核心算法“ 阿尔法泽罗” 的神经蒙特卡洛树搜索( Nynal MCTS) 计算能力, 以解决量化的博莱安公式满意度( QSAT) 问题( QSAT) 问题, 这些问题已经全部完成 。 知道每个QSAT问题都相当于QSAT游戏, 游戏结果可以用来找出原始QSAT问题的解决办法。 我们建议了一种方法, 将Quatificed Boolean 公式( QBFS) 编码为图形, 并应用图形神经网络( GNNN) 来将 QBFS in coloral MCTS ( ) 。 经过培训后, 一个现成的QSAT解答器被用于评估算法的性表现。 我们的结果显示, 在有限范围内的问题中,, 算算算法学会学会只从自我游戏学会学会学会如何正确解决问题。