We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer. We consider general Bayesian games, where the payoffs of both the optimizer and the learner could depend on the type, which is drawn from a publicly known distribution, but revealed privately to the learner. We address the following questions: (a) what is the bare minimum that the optimizer can guarantee to obtain regardless of the no-regret learning algorithm employed by the learner? (b) are there learning algorithms that cap the optimizer payoff at this minimum? (c) can these algorithms be implemented efficiently? While building this theory of optimizer-learner interactions, we define a new combinatorial notion of regret called polytope swap regret, that could be of independent interest in other settings.
翻译:我们研究的是重复的双玩游戏,其中一名玩家,即学习者,采用不累累学习策略,而另一人,即优化者,则是理性的效用最大化者。我们考虑的是普通的贝叶斯游戏,优化者和学习者的报酬取决于哪种类型,这种类型是从公开的分布中抽取的,但私下向学习者透露。我们解决了以下问题:(a) 优化者能够保证获得的最起码的最低限度是什么,而不管学习者采用的不累累累学习算法如何? (b) 是否有学习算法,使优化者的报酬达到最低限度? (c) 这些算法能够有效地实施吗?在建立这种优化者与学习者互动的理论的同时,我们定义了一种叫作多功能互换遗憾的新组合式概念,这在其他环境中可能具有独立的兴趣。