Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we derive that computing an approximate Markov Perfect Equilibrium (MPE) in a finite-state discounted Stochastic Game within the exponential precision is \textbf{PPAD}-complete. We adopt a function with a polynomially bounded description in the strategy space to convert the MPE computation to a fixed-point problem, even though the stochastic game may demand an exponential number of pure strategies, in the number of states, for each agent. The completeness result follows the reduction of the fixed-point problem to {\sc End of the Line}. Our results indicate that finding an MPE in SGs is highly unlikely to be \textbf{NP}-hard unless \textbf{NP}=\textbf{co-NP}. Our work offers confidence for MARL research to study MPE computation on general-sum SGs and to develop fruitful algorithms as currently on zero-sum SGs.
翻译:与Markov决策程序在强化学习中的作用相似, 沙沙运动会为研究多剂强化学习( MARL) 和相继剂相互作用奠定了基础。 在本文中, 我们得出, 在指数精确度范围内计算一个有限且有价折扣的沙沙游戏中, 大约的Markov 完美平衡( MPE) 是 完成的 。 我们采用了一种功能, 在战略空间中将MPE 计算转换成一个固定点问题时, 使用一个多球体的描述, 尽管 沙沙游戏可能要求每个代理体在数量上有指数数的纯战略。 完整的结果是固定点问题减少至 ~ c 线的结束 。 我们的结果表明, 在SG 中找到 MPE 极不可能是 textbf{ PPAD}- 硬的 。 除非在战略空间中找到 textbf{ NPT{ textb{ textb{ { fco{ { { co- NPT}, 我们的工作为MARL的研究提供了信心, 来研究一般和SG 的计算结果, SG 。