A reinforcement learning (RL) control policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RLpolicy by augmenting it with an $\mathcal{L}_{1}$ adaptive controller ($ \mathcal{L}_{1}$AC). Leveraging the capability of an $\mathcal{L}_{1}$AC for fast estimation and active compensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and model-based methods. A video for the experiments on a real Pendubot setup is availableathttps://youtu.be/xgOB9vpyUgE.
翻译:在名义环境中培训的强化学习(RL)控制政策可能因存在动态变异而在新的/周期环境中失败。为了控制具有连续状态和动作空间的系统,我们建议增加一个附加方法,通过使用$mathcal{L ⁇ 1}$适应控制器(mathcal{L ⁇ 1}$AC)来强化预先培训的RL政策。利用$\mathcal{L ⁇ 1}$AC的能力来快速估计和积极补偿动态变异,拟议方法可以提高RL政策的稳健性,该政策在模拟器中或在现实世界中培训,不考虑广泛的动态变异。数字和实际世界实验从经验上表明拟议方法在以无模型和基于模型的方法培训的强化RL政策方面的有效性。在实际的Pendubot设置上进行实验的视频可参见https://yotu.be/xgOB9vpyUgE。