We study the open question of how players learn to play a social optimum pure-strategy Nash equilibrium (PSNE) through repeated interactions in general-sum coordination games. A social optimum of a game is the stable Pareto-optimal state that provides a maximum return in the sum of all players' payoffs (social welfare) and always exists. We consider finite repeated games where each player only has access to its own utility (or payoff) function but is able to exchange information with other players. We develop a novel regret matching (RM) based algorithm for computing an efficient PSNE solution that could approach a desired Pareto-optimal outcome yielding the highest social welfare among all the attainable equilibria in the long run. Our proposed learning procedure follows the regret minimization framework but extends it in three major ways: (1) agents use global, instead of local, utility for calculating regrets, (2) each agent maintains a small and diminishing exploration probability in order to explore various PSNEs, and (3) agents stay with the actions that achieve the best global utility thus far, regardless of regrets. We prove that these three extensions enable the algorithm to select the stable social optimum equilibrium instead of converging to an arbitrary or cyclic equilibrium as in the conventional RM approach. We demonstrate the effectiveness of our approach through a set of applications in multi-agent distributed control, including a large-scale resource allocation game and a hard combinatorial task assignment problem for which no efficient (polynomial) solution exists.
翻译:暂无翻译