Reinforcement learning (RL) is a promising, upcoming topic in automatic control applications. Where classical control approaches require a priori system knowledge, data-driven control approaches like RL allow a model-free controller design procedure, rendering them emergent techniques for systems with changing plant structures and varying parameters. While it was already shown in various applications that the transient control behavior for complex systems can be sufficiently handled by RL, the challenge of non-vanishing steady-state control errors remains, which arises from the usage of control policy approximations and finite training times. To overcome this issue, an integral action state augmentation (IASA) for actor-critic-based RL controllers is introduced that mimics an integrating feedback, which is inspired by the delta-input formulation within model predictive control. This augmentation does not require any expert knowledge, leaving the approach model free. As a result, the RL controller learns how to suppress steady-state control deviations much more effectively. Two exemplary applications from the domain of electrical energy engineering validate the benefit of the developed method both for reference tracking and disturbance rejection. In comparison to a standard deep deterministic policy gradient (DDPG) setup, the suggested IASA extension allows to reduce the steady-state error by up to 52 $\%$ within the considered validation scenarios.
翻译:强化学习(RL)是自动控制应用程序中一个充满希望的、即将出现的话题。在传统控制方法需要先验系统知识的情况下,像RL这样的数据驱动控制方法允许采用无模型控制控制器设计程序,让它们成为变化中的工厂结构和不同参数的系统的新发现技术。虽然在各种应用中已经表明,复杂系统的中转控制行为可由RL充分处理,但非衰败性稳定状态控制错误的挑战仍然存在,这来自控制政策近似和有限培训时间的使用。为了克服这一问题,引入了一个基于行为者-批评的RL控制器的综合行动状态增强(IASA),以模拟综合反馈,而这种反馈是在模型预测控制中的三角体投入配制的启发下产生的。这种增强不需要任何专家知识,使方法模型自由。结果,RL控制器学会如何更有效地抑制稳定状态控制偏差。两个来自电气能源工程领域的模范应用验证了已开发方法在跟踪和拒绝扰动两方面的效益。与标准深度确定性政策梯度的扩展相比,通过考虑的IS-DGA(NIGA)在标准深度确定性政策梯度设定的推后,可以将稳定地推算。