While discounted payoff games and classic games that reduce to them, like parity and mean-payoff games, are symmetric, their solutions are not. We have taken a fresh view on the properties that optimal solutions need to have, and devised a novel way to converge to them, which is entirely symmetric. We achieve this by building a constraint system that uses every edge to define an inequation, and update the objective function by taking a single outgoing edge for each vertex into account. These edges loosely represent strategies of both players, where the objective function intuitively asks to make the inequation to these edges sharp. In fact, where they are not sharp, there is an `error' represented by the difference between the two sides of the inequation, which is 0 where the inequation is sharp. Hence, the objective is to minimise the sum of these errors. For co-optimal strategies, and only for them, it can be achieved that all selected inequations are sharp or, equivalently, that the sum of these errors is zero. While no co-optimal strategies have been found, we step-wise improve the error by improving the solution for a given objective function or by improving the objective function for a given solution. This also challenges the gospel that methods for solving payoff games are either based on strategy improvement or on value iteration.
翻译:尽管折扣收益博弈及其可归约的经典博弈(如奇偶博弈与均值收益博弈)具有对称性,但其解并不对称。本文重新审视了最优解所需满足的性质,并提出了一种完全对称的新方法以收敛至最优解。我们通过构建一个约束系统实现这一目标,其中每条边定义一个不等式,并通过为每个顶点考虑一条出边来更新目标函数。这些边松散地表示双方玩家的策略,而目标函数直观地要求使这些边对应的不等式变为严格等式。实际上,当不等式非严格时,存在由不等式两侧差值表示的“误差”,该误差在不等式严格时为0。因此,目标是最小化这些误差的总和。对于共优策略且仅在此情况下,可以实现所有选定不等式均为严格等式,或等价地使误差总和为零。虽然尚未找到共优策略,我们通过改进给定目标函数的解或改进给定解的目标函数,逐步优化误差。这一方法也对“收益博弈求解方法仅基于策略改进或值迭代”的传统观点提出了挑战。