In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. When the policy has a (nearly) bang-bang structure, we observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters. To tackle this issue, we propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning. We investigate a specific well-known battery storage problem, and show that the proposed method delivers a homogeneous and faster learning than a classical policy gradient approach.
翻译:在本文中,我们感兴趣的是纯粹经济成本的最佳控制问题,这些问题往往产生具有(近乎)爆炸性结构的最佳政策。我们侧重于基于模型预测控制(MPC)的政策近似,以及使用确定性政策梯度方法,在出现未经模拟的随机性或模式错误的情况下优化MPC闭环性能。当政策有一个(近于)爆炸性结构时,我们观察到,政策梯度方法可能难以在政策参数中产生有意义的步骤。为了解决这一问题,我们提议了一个基于内点方法的同质式战略,在学习期间放松政策。我们调查了一个众所周知的具体电池储存问题,并表明拟议的方法提供了比传统政策梯度方法更一致和更快的学习。