评价对实经验性游戏理论分析的战略探索 (Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis)

In empirical game-theoretic analysis (EGTA), game models are extended iteratively through a process of generating new strategies based on learning from experience with prior strategies. The strategy exploration problem in EGTA is how to direct this process so to construct effective models with minimal iteration. A variety of approaches have been proposed in the literature, including methods based on classic techniques and novel concepts. Comparing the performance of these alternatives can be surprisingly subtle, depending sensitively on criteria adopted and measures employed. We investigate some of the methodological considerations in evaluating strategy exploration, defining key distinctions and identifying a few general principles based on examples and experimental observations. In particular, we emphasize the fact that empirical games create a space of strategies that should be evaluated as a whole. Based on this fact, we suggest that the minimum regret constrained profile (MRCP) provides a particularly robust basis for evaluating a space of strategies, and propose a local search method for MRCP that outperforms previous approaches. However, the computation of MRCP is not always feasible especially in large games. In this scenario, we highlight consistency considerations for comparing across different approaches. Surprisingly, we find that recent works violate these considerations that are necessary for evaluation, which may result in misleading conclusions on the performance of different approaches. For proper evaluation, we propose a new evaluation scheme and demonstrate that our scheme can reveal the true learning performance of different approaches compared to previous evaluation methods.

翻译：在实证游戏理论分析(EGTA)中,游戏模型通过基于以往战略经验的学习产生新战略的过程而反复扩展。EGTA的战略探索问题在于如何引导这一进程,以建立尽可能少的迭代的有效模型。文献中提出了各种办法,包括以经典技术和新概念为基础的方法。比较这些替代方法的性能可能令人惊讶地微妙,这取决于所采用的标准和措施。我们调查了在评价战略探索、确定关键区别和根据实例和实验观察确定一些一般原则方面的一些方法考虑。我们特别强调了实证游戏为整个评估战略创造了空间这一事实。基于这一事实,我们建议最低遗憾限制概况(MRCP)为评价战略空间提供了特别牢固的基础,并提出了一种与以往方法相违背的MRCP当地搜索方法。然而,在大型游戏中,对MRCP的计算并不总是可行的。在这种假设中,我们强调对不同方法进行比较时的一致性考虑。我们发现,对不同方法进行一致的考量的考量,而最近的工作则可能违反了我们以往的正确考评方法。我们发现,在评估中得出了一种正确的考评结果。