This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems. Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time, both critical for making state-of-the-art DRL algorithms practically applicable. DRL-agent learns an optimal policy via a trial-and-error method while interacting with the real-world environment. And it is desirable to minimize the direct interaction of the DRL agent with the real-world power grid due to its safety-critical nature. Additionally, state-of-the-art DRL-based policies are mostly trained using a physics-based grid simulator where dynamic simulation is computationally intensive, lowering the training efficiency. We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model, instead of a real-world power-grid or physics-based simulation, is utilized with the policy learning framework, making the process faster and sample efficient. However, stabilizing model-based DRL is challenging because of the complex system dynamics of large-scale power systems. We solved these issues by incorporating imitation learning to have a warm start in policy learning, reward-shaping, and multi-step surrogate loss. Finally, we achieved 97.5% sample efficiency and 87.7% training efficiency for an application to the IEEE 300-bus test system.
翻译:本篇文章提出一种基于模型的深层强化学习(DRL)方法,用于设计应急控制战略,解决电力系统短期电压稳定性问题。最近的进展显示,在无模型的DRL发电系统方法方面,以DRL为基础的示范方法取得了有希望的成果,但没有模型的方法却因抽样效率低和培训时间差而受害,这两种方法对于使最先进的DRL算法切实适用都至关重要。DRL代理商在与现实世界环境互动时,通过试探和试探方法学习一种最佳政策。此外,由于DRL代理商具有安全危急的性质,因此最好尽量减少其与现实世界电网的直接互动。此外,以DRL为基础的最先进的DRL政策大多是使用基于物理的电网模拟器模拟器进行训练,在动态模拟中,动态模拟是计算,降低培训效率。我们提出的基于深神经网络(DNN)的动态探测模型,而不是基于现实世界的电网或物理的模拟,与政策学习框架一起使用,使以最先进的DR-L为基础的测试系统能够更快速、更快速、更快速、更慢的学习系统。