In iterative approaches to empirical game-theoretic analysis (EGTA), the strategy space is expanded incrementally based on analysis of intermediate game models. A common approach to strategy exploration, represented by the double oracle algorithm, is to add strategies that best-respond to a current equilibrium. This approach may suffer from overfitting and other limitations, leading the developers of the policy-space response oracle (PSRO) framework for iterative EGTA to generalize the target of best response, employing what they term meta-strategy solvers (MSSs). Noting that many MSSs can be viewed as perturbed or approximated versions of Nash equilibrium, we adopt an explicit regularization perspective to the specification and analysis of MSSs. We propose a novel MSS called regularized replicator dynamics (RRD), which simply truncates the process based on a regret criterion. We show that RRD is more adaptive than existing MSSs and outperforms them in various games. We extend our study to three-player games, for which the payoff matrix is cubic in the number of strategies and so exhaustively evaluating profiles may not be feasible. We propose a profile search method that can identify solutions from incomplete models, and combine this with iterative model construction using a regularized MSS. Finally, and most importantly, we reveal that the regret of best response targets has a tremendous influence on the performance of strategy exploration through experiments, which provides an explanation for the effectiveness of regularization in PSRO.
翻译:在对游戏理论分析(EGTA)的迭代方法中,战略空间根据对中间游戏模型的分析逐步扩大。一种共同的战略探索方法,由双螺旋算法所代表,是一种共同的战略探索方法,是增加最符合当前平衡的战略。这种方法可能受到过份和其他限制的影响,使迭接的 EGTA 政策空间反应或角框架的开发者能够利用它们所称的元战略解决方案(MSS)来推广最佳应对目标。我们注意到许多MSS可被视为纳什平衡的渗透性或近似版本。我们采用明确的正规化观点来说明和分析MSS的规格和分析。我们提出一个新的MSS称为正规化的复制体动态(RRD),它只是根据遗憾标准调整程序。我们表明RDRD比现有的MSS更适应性强,并在各种游戏中超越了它们。我们的研究范围扩大到三种玩家游戏,因为其报酬矩阵在战略数量上是相异的,因此我们详尽地评估了纳什均衡的公式。我们提出了一个新的MSS正规化动态模型,最后通过一个不完全的模型来说明。我们提出一个搜索方法,从这个模型来得出了一种不完全的模型。