半马尔科夫州-国家行动-独立折扣因素的半马尔科夫运动会 (Zero-Sum Semi-Markov Games with State-Action-Dependent Discount Factors)

Semi-Markov model is one of the most general models for stochastic dynamic systems. This paper deals with a two-person zero-sum game for semi-Markov processes. We focus on the expected discounted payoff criterion with state-action-dependent discount factors. The state and action spaces are both Polish spaces, and the payoff function is $\omega$-bounded. We first construct a fairly general model of semi-Markov games under a given semi-Markov kernel and a pair of strategies. Next, based on the standard regularity condition and the continuity-compactness condition for semi-Markov games, we derive a "drift condition" on the semi-Markov kernel and suppose that the discount factors have a positive lower bound, under which the existence of the value function and a pair of optimal stationary strategies of our semi-Markov game are proved by using the Shapley equation. Moreover, when the state and action spaces are both finite, a value iteration-type algorithm for computing the value function and $\varepsilon$-Nash equilibrium of the game is developed. The convergence of the algorithm is also proved. Finally, we conduct numerical examples to demonstrate our main results.

翻译：半马可夫模式是随机动态系统最普通的模式之一。本文涉及半马可夫流程的双人零和游戏。我们侧重于预期的折扣补偿标准, 取决于国家行动的折扣因素。州和行动空间都是波兰空间, 报酬功能是按美元计算的。我们首先在给定的半马可夫内核和一对策略下构建一个相当一般的半马可夫游戏模式。其次, 根据半马可夫游戏的标准规律性条件和连续性- 兼容性条件, 我们在半马可夫内核内核上推出一个“ 三角条件 ”, 假设折扣因素具有积极的下限, 根据该条件, 我们半马可夫游戏的价值功能和一对最佳固定战略的存在, 由沙普利方程式来证明。此外, 当州和行动空间既有限, 也是一种计算价值函数的数值型算法, 也由 $\ varepslon 和 $- Nash 游戏的平衡性条件, 我们在半马尔科夫内核游戏中产生“ ” “ ”, 最后证明了我们的主要算法的趋一致。