Nash Q-learning may be considered one of the first and most known algorithms in multi-agent reinforcement learning (MARL) for learning policies that constitute a Nash equilibrium of an underlying general-sum Markov game. Its original proof provided asymptotic guarantees and was for the tabular case. Recently, finite-sample guarantees have been provided using more modern RL techniques for the tabular case. Our work analyzes Nash Q-learning using linear function approximation -- a representation regime introduced when the state space is large or continuous -- and provides finite-sample guarantees that indicate its sample efficiency. We find that the obtained performance nearly matches an existing efficient result for single-agent RL under the same representation and has a polynomial gap when compared to the best-known result for the tabular case.
翻译:Nash Q 学习可被视为多试剂加固学习(MARL)中最早和最著名的算法之一,用于构成基本一般和Markov游戏纳什平衡的学习政策,其原始证据提供了无药可治的保证,并用于表格案件。最近,对表格案件采用了更现代的RL技术,提供了有限抽样保证。我们的工作用线性功能近似法分析了Nash Q 学习 -- -- 在国家空间大或连续时采用的一种代表制度 -- -- 并提供有限抽样保证,以表明其抽样效率。我们发现,获得的业绩几乎符合同一代表法下单一试剂RL的现有有效结果,与列表案件最著名的结果相比,存在多重差距。</s>