This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games.The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback.
翻译:本文调查当人们能够在离线拥堵游戏中有效恢复大约 Nash Equilibrium (NE) 的近似 Nash Equilibrium (NE) 时, 就可以有效地在离线拥堵游戏中找到一个全新的单单位偏差覆盖条件, 并给出一种能够恢复近似 NE的悲观型算法。 有趣的是, 在离线一般和普通游戏中, 现有的数据集覆盖范围假设必然取决于行动的数量, 而在堵塞游戏中, 这可能具有巨大的指数性。 我们考虑三种不同的反馈类型, 从设施一级( a. k. a., 半偏差) 的反馈开始, 我们提出一种新的反馈类型, 游戏一级( a. k. a., 土匪) 的反馈类型。 对于代理级别( a. a. a. a.