多机构强化学习:有选择的理论和对数值的概述 (Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms)

Recent years have witnessed significant advances in reinforcement learning (RL), which has registered great success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.

翻译：近几年来,强化学习(RL)取得了显著进展,这在解决机器学习中各种顺序决策问题方面取得了巨大成功。大多数成功的RL应用,例如Go和Poker的游戏、机器人和自主驾驶等,都涉及一个以上单一的代理人的参与,这自然属于多剂RL(MARL)的范畴,这是一个具有相对较长历史的领域,最近由于单一试剂RL技术的进步而重新出现。虽然经验上的成功,但MARL的理论基础在文献中相对缺乏。在本章中,我们有选择地概述了MARL的理论基础,重点是理论分析所支持的算法。更具体地说,我们审查MARL算法的理论结果主要是在两个具有代表性的框架(Markov/Schacistic游戏和大式游戏)内,这自然属于多剂RL(MARL)的范畴,这是它们所处理的任务类型,即充分合作、充分竞争和两种组合。我们还引入了几个具有挑战性但具有挑战性的国家算法的应用。在目前对MARL的审评中,我们强调一些新的方向,在目前对MARL进行某些审查时,我们本身的理论和税法系的研究中,我们强调的理论的理论-理论-理论-理论-在研究中,在研究中,在研究中,在研究中,这是一个摩尔法系的轨道上的理论和税法系的研究中,这是一个方向上的理论和税法系的研究中,我们摩尔法系的轨道上的理论-法系的轨道学系的轨道上的理论和税法系的轨道学系的轨道学系的轨道学系中,我们研究中,我们的研究,这是一个研究,这是一个研究的轨道上的理论学系的轨道上的理论学的轨道上的理论学的轨道上,我们的研究,这是一个研究,这是一个研究,这是一个分。