We propose a parameterization of a nonlinear dynamic controller based on the recurrent equilibrium network, a generalization of the recurrent neural network. We derive constraints on the parameterization under which the controller guarantees exponential stability of a partially observed dynamical system with sector-bounded nonlinearities. Finally, we present a method to synthesize this controller using projected policy gradient methods to maximize a reward function with arbitrary structure. The projection step involves the solution of convex optimization problems. We demonstrate the proposed method with simulated examples of controlling nonlinear plants, including plants modeled with neural networks.
翻译:我们提议根据经常性平衡网络对非线性动态控制器进行参数化,对经常性神经网络进行一般化。我们从参数化中得出一些限制,根据参数化,控制器保证部分观测到的动态系统具有部门限制的非线性,其指数稳定性。最后,我们提出一种方法,利用预测的政策梯度方法合成该控制器,以任意结构实现奖励功能最大化。预测步骤涉及解决二次曲线优化问题。我们用模拟的例子演示了拟议的方法,以控制非线性工厂,包括以神经网络建模的工厂。