Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations, which excessively restricts its application for real-world robotics. Prior work claimed that adding regularization to the value function is equivalent to learning a robust policy with uncertain transitions. Although the regularization-robustness transformation is appealing for its simplicity and efficiency, it is still lacking in continuous control tasks. In this paper, we propose a new regularizer named $\textbf{U}$ncertainty $\textbf{S}$et $\textbf{R}$egularizer (USR), by formulating the uncertainty set on the parameter space of the transition function. In particular, USR is flexible enough to be plugged into any existing RL framework. To deal with unknown uncertainty sets, we further propose a novel adversarial approach to generate them based on the value function. We evaluate USR on the Real-world Reinforcement Learning (RWRL) benchmark, demonstrating improvements in the robust performance for perturbed testing environments.
翻译:强化学习 (RL) 被公认为在环境扰动下缺乏一般化和稳健性,这过度限制了其在现实世界机器人中的应用。 先前的工作声称, 将正规化添加到价值函数中等于学习一种充满活力的过渡性政策。 虽然正规化- 粗糙性转变要求简单化和效率, 但它仍然缺乏连续的控制任务 。 在本文中, 我们提议一个新的正规化器名为$\ textbf{S}$curecurety $\ textbf{S}$et $\ textbf{R}$egularizer (USR), 其方法是制定过渡功能参数空间的不确定性设置 。 特别是, USR 足够灵活, 足以连接到任何现存的 RL 框架 。 为了处理未知的不确定性, 我们进一步提出一种新的对抗性方法, 以价值函数为基础生成它们 。 我们用真实世界强化学习基准来评估USR, 显示在被扰动的测试环境中的可靠性能改进 。