We study the game redesign problem in which an external designer has the ability to change the payoff function in each round, but incurs a design cost for deviating from the original game. The players apply no-regret learning algorithms to repeatedly play the changed games with limited feedback. The goals of the designer are to (i) incentivize all players to take a specific target action profile frequently; and (ii) incur small cumulative design cost. We present game redesign algorithms with the guarantee that the target action profile is played in T-o(T) rounds while incurring only o(T) cumulative design cost. Game redesign describes both positive and negative applications: a benevolent designer who incentivizes players to take a target action profile with better social welfare compared to the solution of the original game, or a malicious attacker whose target action profile benefits themselves but not the players. Simulations on four classic games confirm the effectiveness of our proposed redesign algorithms.
翻译:我们研究游戏重新设计问题,让外部设计师能够改变每一回合的付款功能,但需要花费设计成本才能偏离最初的游戏。玩家运用无回报学习算法,反复玩有变化的游戏,反馈有限。设计师的目标是(一) 激励所有玩家经常采取特定目标动作配置;和(二) 产生少量累积设计成本。我们提出游戏重新设计算法,保证目标动作配置在T-o(T)回合中播放,同时只产生o(T)累积设计成本。游戏重新设计描述了正和负两种应用:一个仁慈的设计师,鼓励玩家采取目标动作配置,其社会福利比原始游戏的解决方案更好,或一个恶意攻击者,其目标动作配置对玩家有利,但对玩家却不有利。我们四个经典游戏的模拟证实了我们提议的重新设计算法的有效性。