This paper presents a learning dynamic with almost sure convergence guarantee for any stochastic game with turn-based controllers (on state transitions) as long as stage-payoffs induce a zero-sum or identical-interest game. Stage-payoffs for different states can even have different structures, e.g., by summing to zero in some states and being identical in others. The dynamics presented combines the classical stochastic fictitious play with value iteration for stochastic games. There are two key properties: (i) players play finite horizon stochastic games with increasing lengths within the underlying infinite-horizon stochastic game, and (ii) the turn-based controllers ensure that the auxiliary stage-games (induced from the continuation payoff estimated) are strategically equivalent to zero-sum or identical-interest games.
翻译:暂无翻译