存储游戏中的独立学习 (Independent Learning in Stochastic Games)

Reinforcement learning (RL) has recently achieved tremendous successes in many artificial intelligence applications. Many of the forefront applications of RL involve multiple agents, e.g., playing chess and Go games, autonomous driving, and robotics. Unfortunately, the framework upon which classical RL builds is inappropriate for multi-agent learning, as it assumes an agent's environment is stationary and does not take into account the adaptivity of other agents. In this review paper, we present the model of stochastic games for multi-agent learning in dynamic environments. We focus on the development of simple and independent learning dynamics for stochastic games: each agent is myopic and chooses best-response type actions to other agents' strategy without any coordination with her opponent. There has been limited progress on developing convergent best-response type independent learning dynamics for stochastic games. We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum stochastic games, together with a review of other contemporaneous algorithms for dynamic multi-agent learning in this setting. Along the way, we also reexamine some classical results from both the game theory and RL literature, to situate both the conceptual contributions of our independent learning dynamics, and the mathematical novelties of our analysis. We hope this review paper serves as an impetus for the resurgence of studying independent and natural learning dynamics in game theory, for the more challenging settings with a dynamic environment.

翻译：强化学习(RL)最近在许多人工智能应用中取得了巨大的成功。RL的许多前沿应用都涉及多种代理,例如玩象棋游戏和Go游戏、自主驾驶和机器人。不幸的是,古典RL建立的框架不适合多剂学习,因为它假设一个代理环境是固定的,没有考虑到其他代理的适应性。在本审查文件中,我们介绍了在动态环境中多剂学习的随机游戏模式。我们侧重于发展简单和独立的随机游戏的学习动力:每种代理都是近视型的,选择了对其它代理战略的最佳反应类型行动,而没有与她的对手进行任何协调。在开发趋同的最佳反应类型独立学习动态游戏方面进展有限,因为它假定一个代理环境是固定的,没有考虑到其他代理的适应性。我们最近提出的简单和独立的学习动态动态动态,同时回顾在这一环境中进行动态多剂学习的其他同时算法。同时,我们还重新审视游戏理论学理论的一些古典成果,并选择了我们这一动态动态动态动态理论的理论,作为动态动态理论的自然动态理论研究,为我们重新学习的动态动态文件提供了一种独立研究。