普通运动会的国家空间行动特点 (Spatial State-Action Features for General Games)

In many board games and other abstract games, patterns have been used as features that can guide automated game-playing agents. Such patterns or features often represent particular configurations of pieces, empty positions, etc., which may be relevant for a game's strategies. Their use has been particularly prevalent in the game of Go, but also many other games used as benchmarks for AI research. Simple, linear policies of such features are unlikely to produce state-of-the-art playing strength like the deep neural networks that have been more commonly used in recent years do. However, they typically require significantly fewer resources to train, which is paramount for large-scale studies of hundreds to thousands of distinct games. In this paper, we formulate a design and efficient implementation of spatial state-action features for general games. These are patterns that can be trained to incentivise or disincentivise actions based on whether or not they match variables of the state in a local area around action variables. We provide extensive details on several design and implementation choices, with a primary focus on achieving a high degree of generality to support a wide variety of different games using different board geometries or other graphs. Secondly, we propose an efficient approach for evaluating active features for any given set of features. In this approach, we take inspiration from heuristics used in problems such as SAT to optimise the order in which parts of patterns are matched and prune unnecessary evaluations. An empirical evaluation on 33 distinct games in the Ludii general game system demonstrates the efficiency of this approach in comparison to a naive baseline, as well as a baseline based on prefix trees.

翻译：在许多棋盘游戏和其他抽象游戏中,模式被用作可以引导自动游戏媒介的特征。这些模式或特征往往代表了与游戏策略相关的成份、空位置等的特殊配置。这些模式或特征在游戏策略中特别普遍使用,但在游戏游戏游戏中也非常普遍,而且作为AI研究基准的其他游戏也非常普遍。这些特征的简单线性政策不大可能产生最先进的游戏实力,像近年来更常用的深层神经网络那样。然而,它们通常需要的训练资源要少得多得多,这对于大规模研究成百上千种不同的游戏至关重要。在本文件中,我们设计并高效地实施普通游戏的空间状态动作特征。这些模式可以被训练到激励或淡化动作研究基准。这些特征的简单直线性政策不可能产生一些最先进的游戏实力,例如近年来更常用的神经网络。我们提供了许多设计和实施选择的广泛细节,主要侧重于实现高度的直观性,以便支持多种不同的游戏,使用不同的棋盘的基底评估,对于普通游戏或其他图表来说,我们用一种高效的基调模式来评估。我们用一个高效的直观性模型来评估,作为直观的直观的直观的直观方法。我们用直观的直观的直观的直观的直观方法来评估。

相关内容