Reinforcement learning remains one of the major directions of the contemporary development of control engineering and machine learning. Nice intuition, flexible settings, ease of application are among the many perks of this methodology. From the standpoint of machine learning, the main strength of a reinforcement learning agent is its ability to ``capture" (learn) the optimal behavior in the given environment. Typically, the agent is built on neural networks and it is their approximation abilities that give rise to the above belief. From the standpoint of control engineering, however, reinforcement learning has serious deficiencies. The most significant one is the lack of stability guarantee of the agent-environment closed loop. A great deal of research was and is being made towards stabilizing reinforcement learning. Speaking of stability, the celebrated Lyapunov theory is the de facto tool. It is thus no wonder that so many techniques of stabilizing reinforcement learning rely on the Lyapunov theory in one way or another. In control theory, there is an intricate connection between a stabilizing controller and a Lyapunov function. Employing such a pair seems thus quite attractive to design stabilizing reinforcement learning. However, computation of a Lyapunov function is generally a cumbersome process. In this note, we show how to construct a stabilizing reinforcement learning agent that does not employ such a function at all. We only assume that a Lyapunov function exists, which is a natural thing to do if the given system (read: environment) is stabilizable, but we do not need to compute one.
翻译:强化学习仍然是当代控制工程和机器学习发展的主要方向之一。 良好的直觉, 灵活的设置, 应用的便利性是这一方法的许多好处之一。 从机器学习的角度来看, 强化学习剂的主要力量在于它能够“ 抓捕” (learn) 特定环境中的最佳行为。 通常, 该代理器建在神经网络上, 其近似能力导致上述信念。 但是, 从控制工程的角度来看, 强化学习存在严重的缺陷。 最重要的是, 代理器- 环境闭环缺乏稳定性保障。 大量研究已经并正在致力于稳定强化学习。 说到稳定, 值得庆贺的莱普诺夫理论是事实上的工具。 因此, 难怪许多稳定强化学习的技术都以某种方式依赖莱普努诺夫理论。 在控制理论中, 稳定控制控制控制器和给予的直线网功能之间有着复杂的联系。 使用这种对设计稳定强化学习似乎相当有吸引力。 然而, 计算利普诺夫的功能通常不是一种累赘的功能。 在这种自然强化过程中, 我们假定一种稳定的环境是如何稳定。