This study provides a new convergence mechanism in learning in games. Learning in games considers how multiple agents maximize their own rewards through repeated plays of games. Especially in two-player zero-sum games, where agents compete with each other for their rewards, the reward of the agent depends on the opponent's strategy. Thus, a critical problem emerges when both agents learn their strategy following standard algorithms such as replicator dynamics and gradient ascent; their learning dynamics often draw cycles and cannot converge to their optimal strategies, i.e., the Nash equilibrium. We tackle this problem with a novel perspective on asymmetry in learning algorithms between the agents. We consider with-memory games where the agents can store the played actions in their memories in order to choose their subsequent actions. In such games, we focus on the asymmetry in memory capacities between the agents. Interestingly, we demonstrate that learning dynamics converge to the Nash equilibrium when the agents have different memory capacities, from theoretical and experimental aspects. Moreover, we give an interpretation of this convergence; the agent with a longer memory can use a more complex strategy, endowing the utility of the other with strict concavity.
翻译:暂无翻译