It is shown in recent studies that in a Stackelberg game the follower can manipulate the leader by deviating from their true best-response behavior. Such manipulations are computationally tractable and can be highly beneficial for the follower. Meanwhile, they may result in significant payoff losses for the leader, sometimes completely defeating their first-mover advantage. A warning to commitment optimizers, the risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on. That is, the follower knows the full information about both players' payoffs whereas the leader only knows their own payoffs. In this paper, we study the manipulation problem with this information advantage relaxed. We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader. The follower can gather necessary information by querying the leader's optimal commitments against contrived best-response behaviors. Our results indicate that the information advantage is not entirely indispensable to the follower's manipulations: the follower can learn the optimal way to manipulate in polynomial time with polynomially many queries of the leader's optimal commitment.
翻译:最近的研究显示,在斯塔克克尔贝格游戏中,追随者可以通过偏离真正的最佳反应行为来操纵领导者。这种操纵在计算上是可移动的,对追随者非常有益。与此同时,它们可能会给领导者造成重大的补偿损失,有时会完全挫败他们的首次优势。向承诺优化者发出警告后,这些发现显示的风险似乎会在某种程度上通过严格的信息优势来减轻。也就是说,追随者知道关于双方的回报的全部信息,而领导者只知道他们自己的回报。在这份文件中,我们研究操纵问题时会放松信息优势。我们考虑了这样的情景,即追随者没有获得关于领导者报酬的任何信息,而是必须学会通过与领导者的互动来操纵。追随者可以通过询问领导者的最佳承诺和最佳反应行为来收集必要的信息。我们的结果表明,信息优势对于追随者进行操纵并非完全不可或缺:追随者可以学习以最优方式调整多边领导人的提问。</s>