神经符号并发随机零和策略合成游戏的策略合成 (Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games)

Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise probabilistic finite-state agents interacting in a shared continuous-state environment observed through perception mechanisms implemented as neural networks (NNs). We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards under piecewise-constant restrictions on the components of this class of models. To compute values and synthesise strategies, we present, for the first time, implementable value iteration (VI) and policy iteration (PI) algorithms to solve a class of continuous-state CSGs. These require a finite representation of the pre-image of the environment's NN perception mechanism and rely on finite abstract representations of value functions and strategies closed under VI or PI. First, we introduce a Borel measurable piecewise-constant (B-PWC) representation of value functions, extend minimax backups to this representation and propose B-PWC VI. Second, we introduce two novel representations for the value functions and strategies, constant-piecewise-linear (CON-PWL) and constant-piecewise-constant (CON-PWC) respectively, and propose Minimax-action-free PI by extending a recent PI method based on alternating player choices for finite state spaces to Borel state spaces, which does not require normal-form games to be solved. We illustrate our approach with a dynamic vehicle parking example by generating approximately optimal strategies using a prototype implementation of the B-PWC VI algorithm.

翻译：神经符号人工智能的提出，将神经网络与古典符号技术结合起来，这种方法不断增长，越来越受到重视，有必要采用正式的方法来证明它们的正确性。本文提出了一种新的建模形式，称为神经符号并发Stochastic Games (NS-CSGs)，由相互作用的概率状态代理在共享的连续状态环境中，通过实现为神经网络（NNs）的感知机制来观测。我们专注于Borel状态空间的NS-CSGs类型，并在此类模型的组成件上，证明在对零和折扣累积奖励的价值函数的分段常数限制下，价值函数的存在性和可测性。为了计算值和综合策略，我们首次提出了可实现的值迭代（VI）和策略迭代（PI）算法，以解决一类连续状态CSG。这些算法需要环境NN感知机制的有限表示，并依赖于值函数和策略的有限抽象表示，以VI或PI结束。首先，我们引入了Borel可测的分段常数（B-PWC）值函数表示形式，将minimax备份扩展到这个表示形式，并提出B-PWC VI。其次，我们引入了两种价值函数和策略的新表示形式：恒定分段线性表（CON-PWL）和恒定分段常数表（CON-PWC），并通过扩展一个最近基于交替玩家选择进行有限状态空间的PI方法，将Minimax-Action-Free PI引入到Borel状态空间中，这不需要求解正则形式游戏。我们通过一个动态车辆停车的例子来说明我们的方法，通过使用B-PWC VI算法的原型实现，生成了近似最优策略。