Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.
翻译:Stakkelberg 平衡自然出现在一系列广受欢迎的学习问题中,例如在安全游戏或间接机制设计中,这些问题在强化学习文献中日益受到重视。我们提出了一个将Stackelberg equilibria 搜索作为多试剂RL问题实施的总体框架,允许进行多种算法设计选择。我们讨论了如何将先前的方法视为这一框架的具体即时。我们注意到,作为一个关键的洞察力,设计空间允许采用文献中未曾看到的方法,例如利用多任务和元-RL技术,以便跟随者汇合。我们提出了一个使用上下文政策的方法,并在标准领域和新基准领域进行实验性评估,表明与以往方法相比,抽样效率大大提高。最后,我们探索了在我们框架边界以外采用算法设计的效果。