PowerNet:用于可缩放电电网控制的多试剂深强化学习 (PowerNet: Multi-agent Deep Reinforcement Learning for Scalable Powergrid Control)

This paper develops an efficient multi-agent deep reinforcement learning algorithm for cooperative controls in powergrids. Specifically, we consider the decentralized inverter-based secondary voltage control problem in distributed generators (DGs), which is first formulated as a cooperative multi-agent reinforcement learning (MARL) problem. We then propose a novel on-policy MARL algorithm, PowerNet, in which each agent (DG) learns a control policy based on (sub-)global reward but local states from its neighboring agents. Motivated by the fact that a local control from one agent has limited impact on agents distant from it, we exploit a novel spatial discount factor to reduce the effect from remote agents, to expedite the training process and improve scalability. Furthermore, a differentiable, learning-based communication protocol is employed to foster the collaborations among neighboring agents. In addition, to mitigate the effects of system uncertainty and random noise introduced during on-policy learning, we utilize an action smoothing factor to stabilize the policy execution. To facilitate training and evaluation, we develop PGSim, an efficient, high-fidelity powergrid simulation platform. Experimental results in two microgrid setups show that the developed PowerNet outperforms a conventional model-based control, as well as several state-of-the-art MARL algorithms. The decentralized learning scheme and high sample efficiency also make it viable to large-scale power grids.

翻译：本文为电力电网的合作控制开发了高效的多剂深度强化强化学习算法。具体而言,我们认为分布式发电机(DGs)中分散的垂直二级电压控制问题(DGs)首先被设计成合作性多剂强化学习(MARL)问题。然后我们提出一个新的政策性MARL算法(PowerNet),其中每个代理商(DG)学习基于(次)全球奖励的监控政策,但从周边国家学习随机噪音。受一个代理商的本地控制对远离它的代理商影响有限这一事实的驱动,我们利用一个新的空间折扣系数来减少远程代理商的影响,加快培训进程,提高可扩缩性。此外,我们采用一个差异性、基于学习的通信协议来促进邻国之间的协作。此外,为了减轻在政策学习期间引入的系统不确定性和随机噪音的影响,我们利用一个行动平滑动因素来稳定政策执行。为了便利培训和评估,我们开发了一个高效的、高纤维化的模拟模型平台,以加快远程代理商的作用,加快培训过程,加快了培训过程,提高可扩展性能性。此外,还采用了一个不同的通信协议式通信协议,将两个系统化结果作为高缩化的模型,作为高压式系统化的模型,作为高压式系统化的系统化的系统化系统化的系统化系统化系统化系统化系统化系统化。