We show that an N-person non-cooperative semi-Markov game under limiting ratio average pay-off has a pure semi-stationary Nash equilibrium. In an earlier paper, the zero-sum two person case has been dealt with. The proof follows by reducing such perfect information games to an associated semi-Markov decision process (SMDP) and then using existence results from the theory of SMDP. Exploiting this reduction procedure, one gets simple proofs of the following: (a) zero-sum two person perfect information stochastic (Markov) games have a value and pure stationary optimal strategies for both the players under discounted as well as undiscounted pay-off criteria. (b) Similar conclusions hold for N-person non-cooperative perfect information stochastic games as well. All such games can be solved using any efficient algorithm for the reduced SMDP (MDP for the case of Stochastic games). In this paper we have implemented Mondal's algorithm to solve an SMDP under limiting ratio average pay-off criterion. To avoid notational complications we took N=2 in our proof.
翻译:我们证明,在限制平均报酬比率下,N-人非合作半马尔科夫游戏具有纯半静止的半固定纳什均衡。在早先的一篇论文中,对二人案件进行了零和二人案件进行了处理。随后的证据是将这种完美的信息游戏减少到相关的半马尔科夫决定程序(SMDP),然后使用SMDP理论中产生的存在结果。利用这一减少程序,人们获得以下简单证据:(a) 两人的零和两个完美的信息随机(Markov)游戏对被折扣和未折扣的付款标准下的玩家都具有价值和纯固定的最佳策略。 (b) 对N-人的非操作的完美信息模拟游戏也持有类似的结论。所有这类游戏都可以使用减少的SMDP的有效算法来解决(SMDP用于斯托卡特式游戏)。在这份文件中,我们采用了Mondal的完美信息算法,以在限制平均报酬标准下解决SMDP。为了避免N=2,我们在我们的证明中采用了N=2。