A reinforcement learning (RL) policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. Existing robust methods try to obtain a fixed policy for all envisioned dynamic variation scenarios through robust or adversarial training. These methods could lead to conservative performance due to emphasis on the worst case, and often involve tedious modifications to the training environment. We propose an approach to robustifying a pre-trained non-robust RL policy with $\mathcal{L}_1$ adaptive control. Leveraging the capability of an $\mathcal{L}_1$ control law in the fast estimation of and active compensation for dynamic variations, our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world. Numerical experiments are provided to validate the efficacy of the proposed approach.
翻译:在名义环境中培训的强化学习(RL)政策在新的/动荡环境中可能因存在动态变化而失败。现有的稳健方法试图通过强力或对抗性培训为所有设想的动态变化情景获得固定的政策。这些方法可能由于强调最坏的情况而导致保守的绩效,而且往往涉及对培训环境的烦琐修改。我们提出了一种方法,用$\mathcal{L ⁇ 1$的适应性控制来巩固预先培训的非机器人RL政策。在快速估计和积极补偿动态变化中利用$\mathcal{L ⁇ 1$的控制法的能力,我们的方法可以大大改善以标准(即非机器人)方式培训的RL政策在模拟器或现实世界中的稳健性。我们提供了数字实验,以验证拟议方法的有效性。