In this paper, we propose a general meta learning approach to computing approximate Nash equilibrium for finite $n$-player normal-form games. Unlike existing solutions that approximate or learn a Nash equilibrium from scratch for each of the games, our meta solver directly constructs a mapping from a game utility matrix to a joint strategy profile. The mapping is parameterized and learned in a self-supervised fashion by a proposed Nash equilibrium approximation metric without ground truth data informing any Nash equilibrium. As such, it can immediately predict the joint strategy profile that approximates a Nash equilibrium for any unseen new game under the same game distribution. Moreover, the meta-solver can be further fine-tuned and adaptive to a new game if iteration updates are allowed. We theoretically prove that our meta-solver is not affected by the non-smoothness of exact Nash equilibrium solutions, and derive a sample complexity bound to demonstrate its generalization ability across normal-form games. Experimental results demonstrate its substantial approximation power against other strong baselines in both adaptive and non-adaptive cases.
翻译:在本文中,我们建议了一种普通的元学习方法,用于计算一定值美元玩家正常形式的游戏的近似纳什平衡。与现有解决方案不同,这些解决方案从零开始为每个游戏从零开始接近或学习纳什平衡,我们的元求解器直接构建了从游戏工具矩阵到联合战略配置的映射。该映射由拟议的纳什均衡近似指标以自我监督的方式进行参数化和学习,而没有为任何纳什平衡提供地面真象数据。因此,它可以立即预测联合战略配置,该配置在相同游戏分布下,任何看不见的新游戏都接近纳什平衡。此外,如果允许迭接更新,元解脱器可以进一步调整和适应新游戏。我们理论上证明,我们的元解脱器不会受到精确的纳什平衡解决方案的非悬浮效应的影响,并得出样本复杂性,以显示其在正常游戏中的总体能力。实验结果表明,相对于适应性和非适应性情况下的其他强基线,它具有巨大的近似能力。