Robust Reinforcement Learning (RL) focuses on improving performances under model errors or adversarial attacks, which facilitates the real-life deployment of RL agents. Robust Adversarial Reinforcement Learning (RARL) is one of the most popular frameworks for robust RL. However, most of the existing literature models RARL as a zero-sum simultaneous game with Nash equilibrium as the solution concept, which could overlook the sequential nature of RL deployments, produce overly conservative agents, and induce training instability. In this paper, we introduce a novel hierarchical formulation of robust RL - a general-sum Stackelberg game model called RRL-Stack - to formalize the sequential nature and provide extra flexibility for robust training. We develop the Stackelberg Policy Gradient algorithm to solve RRL-Stack, leveraging the Stackelberg learning dynamics by considering the adversary's response. Our method generates challenging yet solvable adversarial environments which benefit RL agents' robust learning. Our algorithm demonstrates better training stability and robustness against different testing conditions in the single-agent robotics control and multi-agent highway merging tasks.
翻译:强力强化学习(RL)侧重于改进模型错误或对抗性攻击下的性能,这有利于实际部署RL代理。强力增强学习(RARL)是强力RL最受欢迎的框架之一。然而,大多数现有文献模型RARL是一个零和同时游戏,纳什均衡作为解决方案概念,可以忽略部署RL的顺序性质,产生过于保守的剂,并引起培训不稳定。在本文中,我们引入了一种新型的强力RL(一种叫RRL-Stack的普通和Stackelberg游戏模型)的等级公式,以正式确定连续性,并为强力培训提供额外的灵活性。我们开发了斯塔克尔贝格政策梯级算法,以解决RRRL-Stack,通过考虑对手的反应来利用斯塔克尔贝格的学习动态。我们的方法产生了具有挑战性但又可溶性的对抗性环境,有利于RL代理的强有力学习。我们的算法表明,针对单一代理机器人控制和多代理人高速公路合并任务中的不同测试条件,培训稳定性和稳健健。