Goal-achieving problems are puzzles that set up a specific situation with a clear objective. An example that is well-studied is the category of life-and-death (L&D) problems for Go, which helps players hone their skill of identifying region safety. Many previous methods like lambda search try null moves first, then derive so-called relevance zones (RZs), outside of which the opponent does not need to search. This paper first proposes a novel RZ-based approach, called the RZ-Based Search (RZS), to solving L&D problems for Go. RZS tries moves before determining whether they are null moves post-hoc. This means we do not need to rely on null move heuristics, resulting in a more elegant algorithm, so that it can also be seamlessly incorporated into AlphaZero's super-human level play in our solver. To repurpose AlphaZero for solving, we also propose a new training method called Faster to Life (FTL), which modifies AlphaZero to entice it to win more quickly. We use RZS and FTL to solve L&D problems on Go, namely solving 68 among 106 problems from a professional L&D book while a previous program solves 11 only. Finally, we discuss that the approach is generic in the sense that RZS is applicable to solving many other goal-achieving problems for board games.
翻译:实现目标的难题是建立明确目标的具体状况的难题。 一个得到很好研究的例子是Go的生死问题类别,它帮助玩家掌握确定区域安全的技能。许多先前的方法,如羊羔搜索尝试无效动作,然后产生所谓的相关区域,对手不需要搜索。本文首先提出一个新的基于RZ的方法,称为RZ的搜索(RZS),以解决Go的L&D问题。RZS试图在确定它们是否为无效的动作后热后移动之前就采取行动。这意味着我们不需要依赖无动的超常动作,从而形成一种更优雅的算法,这样它就可以完全融入我们解答器中的阿尔法泽罗超人级游戏。为了重新使用阿尔法泽罗解决,我们还提出了一个新的培训方法,叫做“更快到生命”(FTLL),它修改阿尔法泽罗,以吸引它更快地赢得。我们使用RZS和FTZZ的动作,我们不需要依赖无动性超动性超动作的超常动作, 也就是我们用一个通用的LD程序来解决第106个问题,而我们只需要在解决第106个通用的LD程序。