A reinforcement learning (RL) control policy could fail in a new/perturbed environment that is different from the training environment, due to the presence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RL policy by augmenting it with an $\mathcal{L}_{1}$ adaptive controller ($\mathcal{L}_{1}$AC). Leveraging the capability of an $\mathcal{L}_{1}$AC for fast estimation and active compensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and model-based methods.
翻译:强化学习(RL)控制政策在与培训环境不同的新的/动荡环境中可能失败,因为存在动态差异。为了控制具有连续状态和动作空间的系统,我们提议增加一个附加方法,通过使用$mathcal{L ⁇ 1}$适应控制器(mathcal{L ⁇ 1}$AC)来强化预先培训的RL政策。利用$mathcal{L ⁇ 1}$AC的能力来快速估算和积极补偿动态变化,拟议方法可以提高RL政策的稳健性,该政策在模拟器中或在现实世界中培训,不考虑广泛的动态变化。数字和实际世界实验从经验上证明了拟议方法在以无模型和基于模型的方法培训的强化RL政策方面的有效性。