薪酬景观和循环游戏中自私优化的强力 (Payoff landscapes and the robustness of selfish optimization in iterated games)

In iterated games, a player can unilaterally exert influence over the outcome through a careful choice of strategy. A powerful class of such "payoff control" strategies was discovered by Press and Dyson in 2012. Their so-called "zero-determinant" (ZD) strategies allow a player to unilaterally enforce a linear relationship between both players' payoffs. It was subsequently shown that when the slope of this linear relationship is positive, ZD strategies are robustly effective against a selfishly optimizing co-player, in that all adapting paths of the selfish player lead to the maximal payoffs for both players (at least when there are certain restrictions on the game parameters). In this paper, we investigate the efficacy of selfish learning against a fixed player in more general settings, for both ZD and non-ZD strategies. We first prove that in any symmetric 2x2 game, the selfish player's final strategy must be of a certain form and cannot be fully stochastic. We then show that there are prisoner's dilemma interactions for which the robustness result does not hold when one player uses a fixed ZD strategy with positive slope. We give examples of selfish adapting paths that lead to locally but not globally optimal payoffs, undermining the robustness of payoff control strategies. For non-ZD strategies, these pathologies arise regardless of the original restrictions on the game parameters. Our results illuminate the difficulty of implementing robust payoff control and selfish optimization, even in the simplest context of playing against a fixed strategy.

翻译：在热电玩游戏中,玩家可以通过仔细选择策略,单方面对结果施加影响。2012年,新闻社和Dyson发现了一股强大的“减税控制”战略。2012年,新闻社和Dyson发现了一股强大的“减税控制”战略。他们所谓的“零决定”战略允许玩家单方面执行两个玩家的付款之间的线性关系。随后我们首先证明,当这种线性关系的斜坡是积极的时,ZD战略对一个自私的优化共同玩家具有很强的影响力,因为自私玩家的所有途径都会给两个玩家带来最大的回报(至少在游戏参数有某些限制的情况下)。在这个文件中,我们调查了自私学习对一个更普通的玩家的“零决定”战略的功效。我们首先证明,在任何对称 2x2 游戏中,自私玩家的最后策略必须是某种形式,不能完全挑剔。我们随后展示了囚犯的两难性互动关系,当一个玩家使用固定的ZD战略来对付一个固定的稳态,而不是平流利性战略,我们用不折价战略来改变。我们最稳性战略的自私性战略,我们用不折价战略来调整。我们用不折价战略的策略来改变了。我们不折价制的策略。我们不折价制。我们用不折中的最佳策略,我们用。我们用。我们用不折中性战略的例子的例子举例性战略的例子举例式战略的例子举例。我们举了。我们用不折。我们举例式战略的例子的例子的例子的例子举例。我们举例取了。我们举例取了。我们举例取了。我们用了一些。我们举例取了。我们用不折。我们用不折。我们用不折。我们用不折。我们用不折。