Reinforcement learning (RL) algorithms have been successfully used to develop control policies for dynamical systems. For many such systems, these policies are trained in a simulated environment. Due to discrepancies between the simulated model and the true system dynamics, RL trained policies often fail to generalize and adapt appropriately when deployed in the real-world environment. Current research in bridging this sim-to-real gap has largely focused on improvements in simulation design and on the development of improved and specialized RL algorithms for robust control policy generation. In this paper we apply principles from adaptive control and system identification to develop the model-reference adaptive control & reinforcement learning (MRAC-RL) framework. We propose a set of novel MRAC algorithms applicable to a broad range of linear and nonlinear systems, and derive the associated control laws. The MRAC-RL framework utilizes an inner-loop adaptive controller that allows a simulation-trained outer-loop policy to adapt and operate effectively in a test environment, even when parametric model uncertainty exists. We demonstrate that the MRAC-RL approach improves upon state-of-the-art RL algorithms in developing control policies that can be applied to systems with modeling errors.
翻译:成功利用强化学习算法为动态系统制定控制政策。对于许多这类系统,这些政策是在模拟环境中培训的。由于模拟模型与真实系统动态之间的差异,在实际环境中部署时,经过培训的政策往往无法对模型进行概括和适当调整。目前关于弥合这种模拟到现实差距的研究主要侧重于改进模拟设计,并开发经过改进和专业化的RL算法,以产生稳健的控制政策。在本文件中,我们应用适应控制和系统识别的原则来开发模型参考适应控制和强化学习(MRC-RL)框架。我们提出了一套适用于广泛的线性和非线性系统的新MRAC算法,并提出了相关的控制法。MRAC-RL框架使用一个内环适应控制控制器,使经过模拟训练的外环政策能够在测试环境中有效适应和运行,即使在存在对等模型的不确定性时。我们证明,在开发控制政策时,可应用MRAC-RL系统改进了在最先进的 RL 模型错误上。