In this work we propose a novel data-driven, real-time power system voltage control method based on the physics-informed guided meta evolutionary strategy (ES). The main objective is to quickly provide an adaptive control strategy to mitigate the fault-induced delayed voltage recovery (FIDVR) problem. Reinforcement learning methods have been developed for the same or similar challenging control problems, but they suffer from training inefficiency and lack of robustness for "corner or unseen" scenarios. On the other hand, extensive physical knowledge has been developed in power systems but little has been leveraged in learning-based approaches. To address these challenges, we introduce the trainable action mask technique for flexibly embedding physical knowledge into RL models to rule out unnecessary or unfavorable actions, and achieve notable improvements in sample efficiency, control performance and robustness. Furthermore, our method leverages past learning experience to derive surrogate gradient to guide and accelerate the exploration process in training. Case studies on the IEEE 300-bus system and comparisons with other state-of-the-art benchmark methods demonstrate effectiveness and advantages of our method.
翻译:在这项工作中,我们根据以物理学为根据的、以物理为根据的、以物理为根据的引导的元进化战略提出了一种新的数据驱动的实时动力系统电压控制方法。主要目标是迅速提供适应性控制战略,以缓解由过失引起的延迟电压回收问题。强化学习方法是为同样或类似的具有挑战性的控制问题开发的,但是在“轻视或看不见”的情景下,这些方法在培训方面效率低下,缺乏稳健性。另一方面,在动力系统中开发了广泛的物理知识,但在以学习为基础的方法中却很少加以利用。为应对这些挑战,我们采用了将物理知识灵活地嵌入远程定位模型的可训练行动掩码技术,以排除不必要的或不受欢迎的行动,并在抽样效率、控制性能和稳健方面实现显著的改进。此外,我们的方法利用以往的学习经验来获得代孕梯来指导和加速培训中的探索进程。关于IEEE 300-bus系统的个案研究和与其他最先进的基准方法的比较证明了我们的方法的有效性和优势。