The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.
翻译:在竞争性多试剂学习中,探索与开发之间的相互作用还远没有被很好地理解。我们以此为动力,研究平滑的Q-学习,这是一个典型的学习模式,明确反映了游戏奖励和勘探成本之间的平衡。我们显示,Q-学习总是与独特的四重奏平衡(QRE)相趋同,即受约束理性游戏的标准解决方案概念(即加权零和多重力游戏),与使用积极勘探率的多元学习者进行加权零和多重力游戏。我们补充了最近关于加权潜在游戏趋同的结果,我们显示,无论代理人数量多重力游戏,在竞争性环境下,都在快速实现Q-学习的趋同,而无需微调参数。正如我们在网络零和游戏中的实验所显示的那样,这些理论结果为在竞争性多重力游戏中目前公开的平衡选择问题提供了必要的算法保证。