The goal of this paper is to make a strong point for the usage of dynamical models when using reinforcement learning (RL) for feedback control of dynamical systems governed by partial differential equations (PDEs). To breach the gap between the immense promises we see in RL and the applicability in complex engineering systems, the main challenges are the massive requirements in terms of the training data, as well as the lack of performance guarantees. We present a solution for the first issue using a data-driven surrogate model in the form of a convolutional LSTM with actuation. We demonstrate that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. Furthermore, we show that iteratively updating the model is of major importance to avoid biases in the RL training. Detailed ablation studies reveal the most important ingredients of the modeling process. We use the chaotic Kuramoto-Sivashinsky equation do demonstarte our findings.
翻译:本文的目的是,在使用强化学习(RL)对受部分差异方程式(PDEs)制约的动态系统进行反馈控制时,为动态模型的使用提供一个强有力的点。为了打破我们在RL中看到的巨大承诺与复杂工程系统适用性之间的差距,主要的挑战在于培训数据方面的大量要求,以及缺乏性能保障。我们提出了第一个问题的解决办法,即使用数据驱动的代用模型,即以导演动为形式的转动式LSTM。我们证明,在培训RL代理的同时学习一个启动模型,大大减少了从实际系统中抽样的所需数据总量。此外,我们表明,反复更新模型对于避免RL培训中的偏差非常重要。详细的模拟研究揭示了建模过程最重要的成份。我们使用了混乱的Kuramoto-Sivashinsky方程式, 恶魔开始我们的发现。