Optimizing strategic decisions (a.k.a. computing equilibrium) is key to the success of many non-cooperative multi-agent applications. However, in many real-world situations, we may face the exact opposite of this game-theoretic problem -- instead of prescribing equilibrium of a given game, we may directly observe the agents' equilibrium behaviors but want to infer the underlying parameters of an unknown game. This research question, also known as inverse game theory, has been studied in multiple recent works in the context of Stackelberg games. Unfortunately, existing works exhibit quite negative results, showing statistical hardness and computational hardness, assuming follower's perfectly rational behaviors. Our work relaxes the perfect rationality agent assumption to the classic quantal response model, a more realistic behavior model of bounded rationality. Interestingly, we show that the smooth property brought by such bounded rationality model actually leads to provably more efficient learning of the follower utility parameters in general Stackelberg games. Systematic empirical experiments on synthesized games confirm our theoretical results and further suggest its robustness beyond the strict quantal response model.
翻译:优化战略决策(a.k.a.计算平衡)是许多不合作的多剂应用成功的关键。然而,在许多现实世界中,我们可能面临与游戏理论问题截然相反的游戏理论问题 -- -- 我们可能直接观察代理人的均衡行为,但想要推断出未知游戏的基本参数。这个研究问题,又称为反向游戏理论,在斯塔克尔贝格游戏的多项近期工作中已经进行了研究。不幸的是,现有工作表现出相当消极的结果,显示了统计的严谨性和计算性硬性,并假定了追随者完全理性的行为。我们的工作放松了完美的理性因素假设,将其推向典型的四方反应模型,这是一种更现实的、相互约束的合理性的行为模型。有趣的是,我们表明这种约束性理性模型带来的平稳财产实际上导致在一般斯塔克尔贝格游戏中以可比较有效的方式学习后续效用参数。关于综合游戏的系统实验实验证实了我们的理论结果,并进一步表明它超越严格的四方反应模型的稳健性。