This paper presents a decentralized learning algorithm for stochastic games, also known as Markov games (MGs). The algorithm is radically uncoupled, model-free, rational, and convergent by reaching near equilibrium in two-agent zero-sum and identical-interest MGs, and certain multi-agent general-sum MGs for both discounted and time-averaged cases. The paper introduces additive-reward product (ARP) games as a new class of Markov games to address convergence beyond zero-sum and identical-interest cases. In ARP games, state is a composition of local states, each local state determines the rewards in an additive way and they get controlled by a single agent. The algorithm can converge almost surely to near equilibrium in general-sum ARP games, provided that the strategic-form games induced by the immediate rewards are strategically equivalent to either two-agent zero-sum or potential games. The approximation errors in the results decay with the episode length and the convergence results can be generalized to the cases with more than two agents if the strategic-form games induced by the rewards are polymatrix games.
翻译:暂无翻译