Reinforcement learning (RL) provides a model-free approach to designing an optimal controller for nonlinear dynamical systems. However, the learning process requires a considerable number of trial-and-error experiments using the poorly controlled system, and accumulates wear and tear on the plant. Thus, it is desirable to maintain some degree of control performance during the learning process. In this paper, we propose a model-free two-step design approach to improve the transient learning performance of RL in an optimal regulator design problem for unknown nonlinear systems. Specifically, a linear control law pre-designed in a model-free manner is used in parallel with online RL to ensure a certain level of performance at the early stage of learning. Numerical simulations show that the proposed method improves the transient learning performance and efficiency in hyperparameter tuning of RL.
翻译:强化学习(RL)为设计非线性动态系统的最佳控制器提供了一种无模式的模型化方法,然而,学习过程需要使用控制不力的系统进行大量试验和感官实验,并积累工厂的磨损。因此,在学习过程中保持一定程度的控制性能是可取的。在本文件中,我们提议一种无模式化的两步设计方法,以便在未知的非线性系统的最佳调节器设计问题中改进RL的瞬时学习性能。具体地说,与在线RL同时使用预先以无模式方式设计的线性控制法,以确保在早期学习阶段达到一定的性能水平。数字模拟表明,拟议的方法提高了超光度调整RL的瞬时性学习性能和效率。