We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kucera parameterization based entirely on input-output exploration data. Using a neural network to express a parameterized set of nonlinear stable operators enables seamless integration with standard deep learning libraries. We demonstrate the approach on a realistic simulation of a two-tank system.
翻译:我们提出了一种反馈控制器设计的框架,它结合了深度强化学习的优化驱动和无模型优势,并结合了Youla-Kucera参数化提供的稳定性保证。最近行为系统的进展让我们可以构建一个数据驱动的内部模型;这使得我们可以基于输入输出探测数据构建Youla-Kucera参数化的替代实现。使用神经网络表达参数化的非线性稳定算子,可以无缝地与标准深度学习库集成。我们在一个真实的两箱系统模拟中演示了这种方法。