A natural goal in multiagent learning besides finding equilibria is to learn rationalizable behavior, where players learn to avoid iteratively dominated actions. However, even in the basic setting of multiplayer general-sum games, existing algorithms require a number of samples exponential in the number of players to learn rationalizable equilibria under bandit feedback. This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE) whose sample complexities are polynomial in all problem parameters including the number of players. To achieve this result, we also develop a new efficient algorithm for the simpler task of finding one rationalizable action profile (not necessarily an equilibrium), whose sample complexity substantially improves over the best existing results of Wu et al. (2021). Our algorithms incorporate several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates, which may be of independent interest. We complement our results with a sample complexity lower bound showing the sharpness of our guarantees.
翻译:多试剂学习的自然目标,除了找到平衡之外,还包括学习可合理化的行为,让玩家学会避免反复支配的行为。然而,即使在多玩家普通游戏和游戏的基本设置中,现有的算法也需要一些玩家数目的样本指数,以便根据土匪反馈学习可合理化的平衡。本文开发了第一行高效算法,用于学习可合理化的Coarse Corcontal Equilibria(CCCE)和Corcontern Equilibria(CE),其样本复杂性在所有问题参数(包括玩家数目)中是多元的。为了实现这一结果,我们还开发了一种新的高效算法,用于更简单的任务,即找到一个可合理化的行动特征(不一定是均衡的),其样本复杂性大大高于Wu等人(2021年)的现有最佳结果。我们的算法包含若干新技术,以保证合理化,而没有(swap)regret,包括一个相关的勘探计划和适应性学习率,这可能具有独立的兴趣。我们用一个比较复杂度较低的样本来补充我们的结果,显示我们的保证的精确度。