Multi-agent reinforcement learning (MARL) has become effective in tackling discrete cooperative game scenarios. However, MARL has yet to penetrate settings beyond those modelled by team and zero-sum games, confining it to a small subset of multi-agent systems. In this paper, we introduce a new generation of MARL learners that can handle nonzero-sum payoff structures and continuous settings. In particular, we study the MARL problem in a class of games known as stochastic potential games (SPGs) with continuous state-action spaces. Unlike cooperative games, in which all agents share a common reward, SPGs are capable of modelling real-world scenarios where agents seek to fulfil their individual goals. We prove theoretically our learning method, SPot-AC, enables independent agents to learn Nash equilibrium strategies in polynomial time.
翻译:多剂强化学习(MARL)在应对互不关联的合作游戏情景方面已经变得有效,然而,MARL尚未渗透到由团队和零和游戏模拟的范围之外的环境,将它局限在一小撮多剂系统上。在本文中,我们引入新一代的MARL学习者,他们能够处理非零和报酬结构和连续设置。特别是,我们在一系列称为随机潜在游戏(SPGs)的游戏中研究MARL问题,这种游戏具有持续的状态行动空间。 与合作游戏不同,所有代理者都分享共同的回报,SPGs有能力模拟真实世界情景,让代理者寻求实现各自目标。 我们在理论上证明了我们的学习方法,SPot-AC(SPot-AC)使独立代理者能够在多时学习什平衡战略。