In recent years, empirical game-theoretic analysis (EGTA) has emerged as a powerful tool for analyzing games in which an exact specification of the utilities is unavailable. Instead, EGTA assumes access to an oracle, i.e., a simulator, which can generate unbiased noisy samples of players' unknown utilities, given a strategy profile. Utilities can thus be empirically estimated by repeatedly querying the simulator. Recently, various progressive sampling (PS) algorithms have been proposed, which aim to produce PAC-style learning guarantees (e.g., approximate Nash equilibria with high probability) using as few simulator queries as possible. One recent work introduces a pruning technique called regret-pruning which further minimizes the number of simulator queries placed in PS algorithms which aim to learn pure Nash equilibria. In this paper, we address a serious limitation of this original regret pruning approach -- it is only able to guarantee that true pure Nash equilibria of the empirical game are approximate equilibria of the true game, and is unable to provide any strong guarantees regarding the efficacy of approximate pure Nash equilibria. This is a significant limitation since in many games, pure Nash equilibria are computationally intractable to find, or even non-existent. We introduce three novel regret pruning variations. The first two variations generalize the original regret pruning approach to yield guarantees for approximate pure Nash equilibria of the empirical game. The third variation goes further to even yield strong guarantees for all approximate mixed Nash equilibria of the empirical game. We use these regret pruning variations to design two novel progressive sampling algorithms, PS-REG+ and PS-REG-M, which experimentally outperform the previous state-of-the-art algorithms for learning pure and mixed equilibria, respectively, of simulation-based games.
翻译:近些年来,实证游戏理论分析(EGTA)已成为分析游戏的有力工具,在游戏中无法准确指定公用事业。相反,EGTA将进入一个神器,即模拟器,可以产生对球员未知公用事业的无偏见的杂音样本,根据一个战略配置。因此,可以通过反复询问模拟器来对公用事业进行经验性估计。最近,提出了各种渐进式抽样算法,目的是用尽可能少的模拟游戏查询来提供PAC式学习保证(例如,近似纳什等离差,概率高 ) 。最近的一项工作引入了一个叫“遗憾处理”的技术,可以进一步减少PS算法中模拟者未知公用事业的查询数量,目的是学习纯纳什等离差。在本文中,我们解决了这种原始遗憾分解方法的严重局限性 -- -- 它只能保证真正纯净净的纳什等离差方法,甚至接近真实游戏的准确度,而且无法为真实的货币游戏提供更近的正值排序的数值变数。