This paper describes how domain knowledge of power system operators can be integrated into reinforcement learning (RL) frameworks to effectively learn agents that control the grid's topology to prevent thermal cascading. Typical RL-based topology controllers fail to perform well due to the large search/optimization space. Here, we propose an actor-critic-based agent to address the problem's combinatorial nature and train the agent using the RL environment developed by RTE, the French TSO. To address the challenge of the large optimization space, a curriculum-based approach with reward tuning is incorporated into the training procedure by modifying the environment using network physics for enhanced agent learning. Further, a parallel training approach on multiple scenarios is employed to avoid biasing the agent to a few scenarios and make it robust to the natural variability in grid operations. Without these modifications to the training procedure, the RL agent failed for most test scenarios, illustrating the importance of properly integrating domain knowledge of physical systems for real-world RL learning. The agent was tested by RTE for the 2019 learning to run the power network challenge and was awarded the 2nd place in accuracy and 1st place in speed. The developed code is open-sourced for public use.
翻译:本文描述了如何将电力系统操作员的域知识纳入强化学习(RL)框架,以有效学习控制电网地形的物剂,防止热层层升高。典型的 RL 地形控制员由于搜索/优化空间巨大而不能很好地运行。在这里,我们提议一个基于演员的电源系统操作员,以解决该问题的组合性质,并利用法国电信组织所开发的RL环境对代理商进行培训。为了应对大优化空间的挑战,通过利用网络物理来改变环境,加强代理商学习,将奖励调适的课程法纳入培训程序。此外,还采用多种情景平行的培训办法,避免将该物剂偏向少数场景,使其适应电网操作的自然变异性。如果不对培训程序进行这些修改,RL 代理商在多数测试情景中都未能成功,说明适当整合物理系统的域知识对于现实世界RL学习的重要性。该物剂在2019年学习电源网络挑战时,通过RTE测试后被环境调整,并被授予公开使用2号。