玩有小操纵成本的重复协作多功能运动会 (Playing Repeated Coopetitive Polymatrix Games with Small Manipulation Cost)

Repeated coopetitive games capture the situation when one must efficiently balance between cooperation and competition with the other agents over time in order to win the game (e.g., to become the player with highest total utility). Achieving this balance is typically very challenging or even impossible when explicit communication is not feasible (e.g., negotiation or bargaining are not allowed). In this paper we investigate how an agent can achieve this balance to win in repeated coopetitive polymatrix games, without explicit communication. In particular, we consider a 3-player repeated game setting in which our agent is allowed to (slightly) manipulate the underlying game matrices of the other agents for which she pays a manipulation cost, while the other agents satisfy weak behavioural assumptions. We first propose a payoff matrix manipulation scheme and sequence of strategies for our agent that provably guarantees that the utility of any opponent would converge to a value we desire. We then use this scheme to design winning policies for our agent. We also prove that these winning policies can be found in polynomial running time. We then turn to demonstrate the efficiency of our framework in several concrete coopetitive polymatrix games, and prove that the manipulation costs needed to win are bounded above by small budgets. For instance, in the social distancing game, a polymatrix version of the lemonade stand coopetitive game, we showcase a policy with an infinitesimally small manipulation cost per round, along with a provable guarantee that, using this policy leads our agent to win in the long-run. Note that our findings can be trivially extended to $n$-player game settings as well (with $n > 3$).

翻译：反复的博彩游戏能够捕捉到这样的局面:一个人必须有效地平衡与其他代理人的合作和竞争,以便最终赢得游戏(例如,成为玩家,成为玩家,总效用最高)。实现这一平衡通常非常具有挑战性,如果明确的沟通不可行(例如,不允许谈判或讨价还价),那么这种平衡就无法实现。在本文中,我们调查一个代理人如何能够在反复的合作多式游戏中取得这种平衡,而没有明确的交流。特别是,我们考虑一个三玩者反复游戏设置,允许我们的代理人(稍稍)操纵其他代理人的基本游戏矩阵,她为此支付操纵费用,而其他代理人则满足薄弱的行为假设。我们首先为我们的代理人提出一个付款矩阵操纵计划和策略序列,这可以保证任何对手的效用会与我们所希望的价值趋同。我们然后用这个策略设计赢得的政策可以在多式游戏中找到。我们随后在多式游戏中可以(略略为)操纵我们的游戏中,我们转而用我们的框架在几个具体的博彩游戏中展现效率,而其他代理人则满足了操纵的操纵成本,在游戏中需要一个固定的游戏中,操纵成本。