Multiagent learning settings are inherently more difficult than single-agent learning because each agent interacts with other simultaneously learning agents in a shared environment. An effective approach in multiagent reinforcement learning is to consider the learning process of agents and influence their future policies toward desirable behaviors from each agent's perspective. Importantly, if each agent maximizes its long-term rewards by accounting for the impact of its behavior on the set of convergence policies, the resulting multiagent system reaches an active equilibrium. While this new solution concept is general such that standard solution concepts, such as a Nash equilibrium, are special cases of active equilibria, it is unclear when an active equilibrium is a preferred equilibrium over other solution concepts. In this paper, we analyze active equilibria from a game-theoretic perspective by closely studying examples where Nash equilibria are known. By directly comparing active equilibria to Nash equilibria in these examples, we find that active equilibria find more effective solutions than Nash equilibria, concluding that an active equilibrium is the desired solution for multiagent learning settings.
翻译:多试剂学习环境本身比单一试剂学习环境更难,因为每个试剂在共享环境中与其他同时学习的代理人互动。多试剂强化学习的有效方法是考虑代理人的学习过程,并从每个代理人的角度影响其未来对可取行为的政策。重要的是,如果每个代理人通过计算其行为对一套趋同政策的影响而最大限度地获得其长期回报,由此产生的多试剂系统就会达到积极平衡。虽然这一新的解决办法概念很普遍,例如纳什平衡等标准解决方案概念是积极平衡的特殊案例,但当积极平衡是优于其他解决方案概念的平衡时,就不清楚。在本文件中,我们从游戏理论的角度分析积极的平衡,仔细研究纳什平衡的例子。通过在这些例子中直接比较积极的平衡与纳什平衡,我们发现积极平衡比纳什平衡找到更有效的解决办法,结论是积极平衡是多试剂学习环境所期望的解决办法。