Partially Observable Stochastic Games (POSGs) are the most general and common model of games used in Multi-Agent Reinforcement Learning (MARL). We argue that the POSG model is conceptually ill suited to software MARL environments, and offer case studies from the literature where this mismatch has led to severely unexpected behavior. In response to this, we introduce the Agent Environment Cycle Games (AEC Games) model, which is more representative of software implementation. We then prove it's as an equivalent model to POSGs. The AEC games model is also uniquely useful in that it can elegantly represent both all forms of MARL environments, whereas for example POSGs cannot elegantly represent strictly turn based games like chess.
翻译:多机构强化学习中使用的游戏模式(POSGs)是最普遍和最常用的游戏模式。我们争辩说,POSG模式在概念上不适合MARL环境软件,我们从文献中提供这种不匹配导致严重意想不到行为的案例研究。对此,我们引入了环境周期运动(AEC运动会)模式,这更能代表软件的运用。然后,我们证明它与POSGs相似。AEC游戏模式也具有独特的用处,因为它能够优雅地代表MARL环境的所有形式,而例如,POSGs不能优雅地代表象棋一样的严格转向游戏。