Learning Nash equilibria is a central problem in multi-agent systems. In this paper, we investigate congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and obtain finite-sample guarantees. Then we propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design. By exploiting the structure of the congestion game, we show the sample complexity of both algorithms depends only polynomially on the number of players and the number of facilities, but not the size of the action set, which can be exponentially large in terms of the number of facilities. We further define a new problem class, Markov congestion games, which allows us to model the non-stationarity in congestion games. We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.
翻译:学习 Nash 平衡是多试剂系统的一个中心问题。 在本文中, 我们调查拥堵游戏, 这是一种具有良性理论结构和广泛现实应用的游戏。 我们首先在面对( 半) 山地反馈的堵塞游戏的不确定性原则时, 提出基于乐观的集中算法, 并获得有限比例的保障。 然后我们通过弗兰克- 沃夫方法和 G- 最佳设计的新颖组合提出分散算法。 通过利用拥堵游戏的结构, 我们显示两种算法的样本复杂性仅取决于玩家数量和设施数量, 而不是行动集的规模, 而行动集的规模在设施数量上可能非常大。 我们还进一步定义了一个新的问题类, 马科夫 堵塞游戏, 使我们能够模拟拥堵游戏中的不常态性。 我们为Markov 拥堵游戏提出一种集中算法, 其样本复杂性再次仅对所有相关问题参数有多重依赖, 而不是行动集的规模 。