Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies and evaluate it experimentally on standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting designs outside the borders of our framework.
翻译:斯塔克尔贝格平衡自然出现在一系列流行的学习问题中,例如在安全游戏或间接机制设计中,这些问题在强化学习文献中日益受到重视。我们提出了一个将斯塔克尔贝格平衡搜索作为一种多剂RL问题实施的总体框架,允许进行广泛的算法设计选择。我们讨论了如何将先前的方法视为这一框架的具体即时。我们注意到,作为一个关键的洞察力,设计空间允许采用文献中未曾看到的方法,例如利用多任务和元-RL技术,使跟踪者趋同。我们提出了一个使用上下文政策并实验性地评估标准基准领域和新基准领域的方法,表明与以往方法相比,抽样效率大大提高。最后,我们探索了在框架边界以外采用设计的效果。