Stackelberg Equilibria arise naturally in a range of popular learning problems, such as in security games or automated mechanism design, and have received increasing attention in the reinforcement learning literature recently. We present a general framework for implementing Stackelberg Equilibria search as a multi-agent RL problem, allowing a wide range of design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We evaluate examples of novel approaches predicted by our framework experimentally on standard benchmark domains. Finally, we discuss directions for future work implied by our work.
翻译:Stakkelberg Equilibria自然出现在一系列流行的学习问题中,例如在安全游戏或自动机制设计中,并且最近得到强化学习文献的日益重视。我们提出了一个将Stackelberg Equilibria搜索作为一个多试剂RL问题实施的一般框架,允许作出广泛的设计选择。我们讨论了如何将先前的方法视为这个框架的具体即时。我们注意到,作为一个关键的洞察力,设计空间允许采用文献中未曾看到的方法,例如利用多任务和元-RL技术来与后续者汇合。我们评估了我们框架在标准基准领域实验预测的新方法的例子。最后,我们讨论了我们工作所隐含的未来工作方向。