This article explores which parameters of the repeated Prisoner's Dilemma lead to cooperation. Using simulations, I demonstrate that the potential function of the stochastic evolutionary dynamics of the Grim Trigger strategy is useful to predict cooperation between Q-learners. The frontier separating the parameter spaces that induce either cooperation or defection can be determined based on the kinetic energy exerted by the respective basins of attraction. When the incentive compatibility constraint of the Grim Trigger strategy is slack, a sudden increase in the observed cooperation rates occurs when the ratio of the kinetic energies approaches a critical value, which itself is a function of the discount factor, multiplied by a correction factor to account for the effect of the algorithms' exploration probability. Using metadata from laboratory experiments, I provide evidence that the insights obtained from the simulations are also useful to explain the emergence of cooperation between humans. The observed cooperation rates show a positive gradient at the frontier characterized by an exploration probability of approximately five percent. In the context of human-to-human interaction, the exploration probability can be viewed as the belief about the opponent's probability to deviate from the equilibrium action.
翻译:文章探索了重复的《囚犯的困境》的哪些参数导致合作。 使用模拟,我证明格林· 触发者战略的随机进化动态的潜在功能对于预测Q- learners之间的合作是有用的。 区分引致合作或叛逃的参数空间的边界可以根据吸引盆地的动能来确定。 当格林· 特里格战略的激励性兼容性限制减弱时, 观察到的合作率会突然增加, 当动能比率接近关键值时, 这本身就是折扣系数的函数, 乘以一个修正系数来计算算法探索概率的效果。 使用实验室实验实验的元数据, 我提供证据表明, 从模拟中获得的洞见也有助于解释人类之间合作的出现。 观察到的合作率显示,在以大约5%的勘探概率为特征的边界上出现了积极的梯度。 在人与人之间的相互作用中, 探索概率可以被视为对对手偏离平衡行动的概率的信念。