以强化学习为基础的强化学习 (Reinforcement Learning based Proactive Control for Transmission Grid Resilience to Wildfire)

Power grid operation subject to an extreme event requires decision-making by human operators under stressful condition with high cognitive load. Decision support under adverse dynamic events, specially if forecasted, can be supplemented by intelligent proactive control. Power system operation during wildfires require resiliency-driven proactive control for load shedding, line switching and resource allocation considering the dynamics of the wildfire and failure propagation. However, possible number of line- and load-switching in a large system during an event make traditional prediction-driven and stochastic approaches computationally intractable, leading operators to often use greedy algorithms. We model and solve the proactive control problem as a Markov decision process and introduce an integrated testbed for spatio-temporal wildfire propagation and proactive power-system operation. We transform the enormous wildfire-propagation observation space and utilize it as part of a heuristic for proactive de-energization of transmission assets. We integrate this heuristic with a reinforcement-learning based proactive policy for controlling the generating assets. Our approach allows this controller to provide setpoints for a part of the generation fleet, while a myopic operator can determine the setpoints for the remaining set, which results in a symbiotic action. We evaluate our approach utilizing the IEEE 24-node system mapped on a hypothetical terrain. Our results show that the proposed approach can help the operator to reduce load loss during an extreme event, reduce power flow through lines that are to be de-energized, and reduce the likelihood of infeasible power-flow solutions, which would indicate violation of short-term thermal limits of transmission lines.

翻译：受极端事件影响的电网操作需要人类操作者在高度认知负荷压力条件下做出决策。在不利的动态事件下提供决策支持,特别是如果预测的话,可以用明智的主动控制来补充。野火期间的电力系统操作需要恢复力驱动的主动控制,以便考虑到野火和故障扩散的动态,进行排泄、线转换和资源分配。然而,在大型系统中,线和负荷转换的可能数量使得传统的预测驱动和随机方法难以计算,导致操作者经常使用贪婪的算法。我们以马尔科夫为决策程序来模拟和解决主动控制问题,并引入一个综合的测试台,用于弹道-时热野火传播和主动动力系统操作。我们改造巨大的野火-调整观察空间,利用它作为热力节点的一部分,以便积极主动地减少传输资产的积极性学习政策。我们的方法允许这一控制者为一代机队的一部分提供设置点,同时,一个我的系统运行者可以决定我们系统流流流流的传输极限的极限值,一个预示着我们系统运行者在24度上显示的运行结果。